scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-28 21:50:23 +00:00

Author	SHA1	Message	Date
Andrea Righi	13e23e8cc9	scx_rustland: dump scheduler statistics before exiting Print all the scheduler statistics before exiting. Reporting the very last state of the scheduler can help to debug events that could trigger error conditions (such as page faults, scheduler congestions, etc.). While at it, fix also some minor coding style issues (tabs vs spaces). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-08 15:37:44 +01:00
Tejun Heo	73c68c6f4a	Merge pull request #131 from sched-ext/htejun scx: Update vmlinux to use SCX_KICK_IDLE	2024-02-07 07:04:00 -10:00
Tejun Heo	2062d1ad1f	scx: Add compat support for SCX_KICK_IDLE and use it for idle CPU wakeups SCX_KICK_IDLE is a new feature which isn't defined in older kernels. Add compat wrapper and use it for idle CPU wakeups. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-02-06 15:28:40 -10:00
Tejun Heo	17014e91fb	scx: Update vmlinux.h to receive SCX_KICK_IDLE Signed-off-by: Tejun Heo <tj@kernel.org>	2024-02-06 15:02:01 -10:00
Andrea Righi	55c9c92b81	Merge pull request #128 from sched-ext/ci-stderr ci: detect errors only from stderr	2024-02-05 18:22:34 +01:00
Tejun Heo	9eda031474	Merge pull request #126 from sirlucjan/cachyos-repo Add linux-sched-ext to CachyOS repo	2024-02-05 07:04:20 -10:00
Andrea Righi	67a53ba621	ci: detect errors only from stderr Search for potential errors only in the kernel logs and the scheduler stderr. In this way we can use "error keywords" in the scheduler's output without triggering false positives in the CI (see for example #127). NOTE: this works, because virtme-ng, when executed in verbose mode, sends the kernel messages to stderr (together with the command's stderr) and it channels the command's stdout to the stdout of the host. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-05 17:59:11 +01:00
David Vernet	95420fc7fc	Merge pull request #125 from sched-ext/reduce_ci_jobs ci: Only do CI runs for pull requests	2024-02-04 18:21:36 -06:00
Tejun Heo	86a4e7bd46	Merge pull request #124 from sched-ext/print_warning scx_userland: Print warning about poor performance	2024-02-04 13:40:20 -10:00
David Vernet	4108ece204	scx_userland: Increase scx_userland timeout This is meant to be an example scheduler that won't necessarily run well in production. Let's remove the 3 second timeout and use the system default of 30. Signed-off-by: David Vernet <void@manifault.com>	2024-02-04 16:23:18 -06:00
David Vernet	fc671aca49	ci: Only do CI runs for pull requests We're basically always runnin two CI jobs: one for a remote push, and another for when a PR is opened. These are essentially measuring the same thing, so let's save CI bandwidth and just do a PR run. This will hopefully make things a bit less noisy as well. Signed-off-by: David Vernet <void@manifault.com>	2024-02-04 16:12:46 -06:00
David Vernet	28a0b82be6	scx_userland: Print warning about poor performance Let's make it clear that this scheduler isn't expected to perform well, and instead point people to scx_rustland. Signed-off-by: David Vernet <void@manifault.com>	2024-02-04 16:02:57 -06:00
Piotr Gorski	52ccf1de57	Add linux-sched-ext to CachyOS repo Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>	2024-02-04 19:11:46 +01:00
Tejun Heo	a3b5941c3a	Merge pull request #122 from sched-ext/htejun common.bpf.h: Add kfunc prototype for scx_bpf_dispatch_cancel()	2024-02-04 07:23:42 -10:00
Tejun Heo	1ca3b8dca8	common.bpf.h: Add kfunc prototype for scx_bpf_dispatch_cancel() And relocate scx_bpf_dispatch_nr_slots() while at it. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-02-03 09:46:13 -10:00
Andrea Righi	b6eee3a5c4	Merge pull request #121 from sched-ext/rustland-duplicate-pids scx_rustland: prevent duplicate PIDs in the task BTreeSet	2024-02-03 17:55:43 +01:00
Andrea Righi	acb174aa51	scx_rustland: prevent duplicate PIDs in the task BTreeSet Items in the task BTreeSet are stored by pid and vruntime. Make sure that we never store multiple items with the same PID, so that re-enqueued tasks are not dispatched multiple times. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-03 14:46:39 +01:00
Tejun Heo	4631392031	Merge pull request #120 from jordalgo/c-headers Include libbpf_h path in c sched compilation	2024-02-02 17:22:15 -10:00
Jordan Rome	dd11d97cdb	Include libbpf_h path in c sched compilation If a user wants to use an external libbpf source (via libbpf_h opt) we need to also pass this to c sched compilation.	2024-02-02 19:46:24 -05:00
David Vernet	7cbcc16be9	Merge pull request #119 from sched-ext/htejun scheds/sync-to-kernel.sh: Drop most schedulers from sync	2024-02-02 14:20:02 -06:00
Tejun Heo	7ffdcc1984	Merge pull request #110 from sched-ext/scx-rustland-pcpu-dsq scx_rustland: per-CPU DSQs + global shared DSQ	2024-02-02 09:23:40 -10:00
Tejun Heo	c7ad3a71f9	scheds/sync-to-kernel.sh: Drop most schedulers from sync Only scx_simple/qmap are in the kernel tree now. Drop the rest from the sync script. Also update the sync script so that it can handle empty rust_scheds variable. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-02-02 09:08:30 -10:00
Tejun Heo	1326b8a539	Merge pull request #118 from sched-ext/docs_fix docs: Update OVERVIEW to match latest APIs	2024-02-02 06:43:08 -10:00
David Vernet	15487a95af	docs: Update OVERVIEW to match latest APIs Pierre Jacquet pointed out that our docs in the scx repo are out of date for the latest APIs. Let's update it so readers don't get confused. Signed-off-by: David Vernet <void@manifault.com>	2024-02-02 10:38:19 -06:00
Andrea Righi	681b3fd807	scx_rustland: more aggressive time slice scaling Allow to scale the effective time slice down to 250 us. This can help to maintain a good quality of the audio even when the system is overloaded by multiple CPU-intensive tasks. Moreover, always round up the time slice scaling factor to be a little more aggressive and prioritize at scaling the time slice, so that we can prioritize low latency tasks even more. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 16:40:59 +01:00
Andrea Righi	26d6d530f0	scx_rustland: enhance interactive task classification Evaluate the number of voluntary context switches per second (nvcsw/sec) for each task using an exponentially weighted moving average (EWMA) with weight 0.5, that allows to classify interactive tasks with more accuracy. Using a simple average over a period of time of 10 sec can introduce small lags every 10 sec, as the statistics for the number of voluntary context switches are refreshed. This can result in interactive tasks taking a brief time to catch up in order to be accurately classified as so, causing for example short audio cracks, small drop of 5-10 fps in games, etc. Using a EMWA allows to smooth the average of nvcsw/sec, preventing short lags in the interactive tasks, while also preventing to incorrectly classify as interactive tasks that may experience an isolated short burst of voluntary context switches. This patch has been tested with the usual test case of playing a videogame while running a parallel kernel build in the background. Without this patch the short lag every 10 sec is clearly noticeable, with this patch applied the game and audio run smoothly. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 16:40:59 +01:00
Andrea Righi	baeea306fc	scx_rustland: rely on the built-in idle selection logic Simplify the idle selection logic by relying only on the built-in idle selection performed in the BPF layer. When there are idle CPUs available in the system, tasks are dispatched directly by the BPF dispatcher without invoking the user-space scheduler. This allows to avoid the user-space overhead and get the best system performance when CPU resources are not overcommitted. Once the number of tasks exceeds the available CPUs, the user-space scheduler takes over. However, by this time, the system is already overcommitted, so there's little advantage in attempting to pinpoint the optimal idle CPU through the user-space scheduler. Instead, tasks can be executed on the first available CPU, consistently dispatching them to the shared DSQ. This allows to achieve the optimal performance both with system under-utilization and over-utilization. With this change in place the user-space scheduler won't dispatch tasks directly to specific CPUs, but we still want to keep this as a generic feature in the BPF layer, so that it can be potentially used in the future by this scheduler or even by other user-space schedulers (once the BPF layer will be moved to a more generic place). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 16:40:59 +01:00
Andrea Righi	b9e60f71ed	scx_rustland: usersched: code refactoring No functional change, just move code around to make it more readable. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 16:40:59 +01:00
Andrea Righi	d13ed5c025	scx_rustland: BPF: refine CPU dispatch logic When the user-space scheduler dispatches a task on a specific CPU, that CPU might not be valid, since the user-space doesn't have visibility of the task's cpumask. When this happens the BPF dispatcher (that has direct visibility of the cpumask) should automatically redirect the task to a valid CPU, but instead of bouncing the task on the shared DSQ, we should try to use the CPU assigned by the built-in idle selection logic. If this CPU is also not valid, then we can simply ignore the task, that has been de-queued and re-enqueued, since a valid CPU will be naturally re-selected at a later time. Moreover, avoid to kick any specific CPU when the task is dispatched to shared DSQ, since the task can be consumed on any CPU and the additional kick would simply add more overhead. Lastly, rename dsq_id_to_cpu() to dsq_to_cpu() and cpu_to_dsq_id() to cpu_to_dsq() for more clarity. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 16:38:17 +01:00
Andrea Righi	45d8b54eb9	scx_rustland: re-introduce per-CPU DSQ + a global shared DSQ With commit `c6ada25` ("scx_rustland: use custom pcpu DSQ instead of SCX_DSQ_LOCAL{_ON}") we tried to introduce custom per-CPU DSQs, instead of using SCX_DSQ_LOCAL and SCX_DSQ_LOCAL_ON to dispatch tasks. This was required, because dispatching tasks using SCX_DSQ_LOCAL_ON doesn't provide a guarantee that the cpumask, checked at dispatch time to determine the validity of a target CPU, remains valid. This method solved the cpumask validity issue, but unfortunately it introduced a noticeable performance regression and a potential starvation issue (that were probably caused by the same problem): if a task is assigned to a CPU in select_cpu() and the scheduler decides to dispatch it on a different CPU, the task will be added to the new CPU's DSQ, but if no dispatch event happens there, the task may remain stuck in the per-CPU DSQ for a long time, triggering the sched-ext watchdog timeout that would kick out the scheduler, for example: 12:53:28 [WARN] FAIL: IPC:CSteamEngin[7217] failed to run for 6.482s (err=1026) 12:53:28 [INFO] Unregister RustLand scheduler Therefore, we reverted this change with `6d89ece` ("scx_rustland: dispatch tasks only on the global DSQ"), dispatching all the tasks to the global DSQ, completely delegating the kernel to distribute tasks among the available CPUs. This is not the ideal solution, because we still want to give the possibility to the user-space scheduler to assign tasks to specific CPUs. Therefore, re-introduce distinct per-CPU DSQs, but also provide a global shared DSQ. Tasks dispatched in the per-CPU DSQs are consumed from the dispatch() callback of their corresponding CPU, tasks dispatched in the global shared DSQ are consumed from any CPU. In this way the BPF layer is able to provide an interface that gives the flexibility to the user-space to dispatch a task on a specific CPU or on the first CPU available, depending on the particular scheduler's need. If an invalid CPU (according to the cpumask) is selected the BPF dispatcher will transparently redirect the task to a valid CPU, selected using the built-in idle selection logic. In the future we may want to improve this part, giving to the user-space the visibility of the cpumask, in order to pick a valid CPU in advance and in a proper synchronized way. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 00:33:35 +01:00
Andrea Righi	b5e846c538	scx_rustland: BPF: small refactoring No functional change, just some refactoring to make the code more clear. We have is_usersched_needed() and set_usersched_needed() that are doing different things (the former is checkig if there are pending tasks for the scheduler, the latter is setting the usersched_needed flag to activate the dispatch of the user-space scheduler). Rename is_usersched_needed() to usersched_has_pending_tasks() to make the code more clear and understandable. Also move dispatch_user_scheduler() closer to the other dispatch-related helper functions. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-01 00:33:35 +01:00
Tejun Heo	439c3cdba7	Merge pull request #116 from sched-ext/htejun Add user_exit_info support to scx_utils and convert the rust scheds accordingly	2024-01-31 12:17:52 -10:00
Tejun Heo	6db362b27a	scx_rustland: Use scx_utils::user_exit_info Instead of the bespoke implementation. This also makes scx_rustland to print out debug dump if exists. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-01-31 11:44:15 -10:00
Tejun Heo	965926f393	scx_rusty: Use scx_utils::user_exit_info Instead of the bespoke implementation. This also makes scx_rusty to print out debug dump if exists. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-01-31 11:08:17 -10:00
Tejun Heo	105dc36b8f	scx_layered: Use scx_utils::user_exit_info Instead of the bespoke implementation. This also makes scx_layered to print out debug dump if exists. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-01-31 10:54:20 -10:00
Tejun Heo	d57a23f481	rust/scx_utils: Add user_exit_info support So that rust scheds can use the same unified exit handling as C scheds. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-01-31 10:54:15 -10:00
Tejun Heo	e6e8c8c9cb	scx: Clean up user_exit_info.h - Separate field size constants into enums. - Add includes so that the header is self-sufficient. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-01-31 10:47:24 -10:00
Tejun Heo	4ee8104a6d	Merge pull request #114 from dschatzberg/local_avoid_enqueue scx_layered: dispatch from select_cpu if possible	2024-01-31 08:33:26 -10:00
Dan Schatzberg	11e487c165	scx_layered: dispatch from select_cpu if possible If we are doing local dispatch, we can avoid enqueue() altogether by dispatching from select_cpu() Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-01-31 09:54:26 -08:00
Tejun Heo	a6142f6f2b	Merge pull request #115 from jordalgo/prometheus-client [scx_layered] downgrade prometheus-client	2024-01-31 06:21:13 -10:00
Jordan Rome	1b3a9a1e72	[scx_layered] downgrade prometheus-client This library at version 22 is not available in fedora: https://src.fedoraproject.org/rpms/rust-prometheus-client Rather than bothering the maintainer, let's just downgrade here.	2024-01-31 04:36:01 -08:00
Tejun Heo	8f806c41b1	Merge pull request #113 from jordalgo/breaking-changes Add BREAKING_CHANGES.md	2024-01-29 10:56:13 -10:00
Jordan Rome	347a81fcff	Add BREAKING_CHANGES.md	2024-01-29 10:44:44 -08:00
Tejun Heo	53106aafa9	Merge pull request #111 from dschatzberg/cleanup_pick_idle scx_layered: small idle_cpumask cleanups	2024-01-29 08:40:35 -10:00
Tejun Heo	d978df2a2f	Merge pull request #112 from sirlucjan/services-meson Make meson.build more readable	2024-01-29 06:44:17 -10:00
Dan Schatzberg	ab5635ff6d	scx_layered: Grab idle_smtmask a bit later This is a really minor optimization, but we don't need idle_smtmask to schedule pinned tasks, so defer it so the nr_cpus_allowed == 1 path is marginally faster. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-01-29 08:16:37 -08:00
Dan Schatzberg	8c9e65d880	scx_layered: Remove unnecessary idle_cpumask idle_cpumask isn't used at all in pick_idle_cpu_from. The only need for these cpumasks is to check if prev_cpu is a wholly idle CPU (and we only do this when smt_enabled). idle_smtmask is sufficient for that check. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-01-29 08:16:37 -08:00
Piotr Gorski	22e775842a	Make meson.build more readable Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>	2024-01-29 17:14:39 +01:00
David Vernet	46ba5908ab	Merge pull request #109 from sirlucjan/services-rework Reworking systemd-service and adding a config file	2024-01-26 22:49:00 -06:00
Piotr Gorski	561cbc4e6d	Update README.md Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>	2024-01-27 00:59:23 +01:00

1 2 3 4 5 ...

350 Commits