JakeHillion/scx

mirror of https://github.com/JakeHillion/scx.git synced 2024-12-02 13:57:11 +00:00

Author	SHA1	Message	Date
Daniel Hodges	2ca42428cd	scx_utils: Add CPU freq transition latency This change adds the CPU frequency transition latency from the `cpuinfo_transition_latency` from sysfs. The value of this field is described [cpufreq docs](https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt). On supported systems it returns the CPU frequency transition latency in nanoseconds. The goal of this change is so that in the future schedulers can use this data to make better frequency scaling decisions. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-06-11 07:35:34 -07:00
Tejun Heo	3e3720fc7f	scx_utils: Add compat support for ops.tick() and ops.dump*() Match rust scx_ops_load!()'s compat support with C's SCX_OPS_LOAD().	2024-06-06 14:16:36 -10:00
Tejun Heo	200af60f2a	scx_layered: Fix load failure due to scheduler_tick() -> sched_tick() rename - scx_utils: Replace kfunc_exists() with ksym_exists() which doesn't care about the type of the symbol. - scx_layered: Fix load failure on kernels >= v6.10-rc due to scheduler_tick() -> sched_tick rename. Attach the tick fentry function to either scheduler_tick() or sched_tick().	2024-06-06 12:54:59 -10:00
Andrea Righi	40e67897a9	Merge pull request #331 from sched-ext/rustland-core-dispatch-debug scx_rustland_core: add extra debugging info to dispatch_task()	2024-06-04 18:58:01 +02:00
Andrea Righi	b363a13310	scx_rustland_core: add enqueue flags to debug info Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-04 17:14:38 +02:00
Andrea Righi	89384754ce	scx_rustland_core: add task time slice to the debug info Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-04 17:14:38 +02:00
Tejun Heo	e556dd375d	scx: Unify loading and running boilerplate across rust schedulers Make restart handling with user_exit_info simpler and consistently use the load and report macros consistently across the rust schedulers. This makes all schedulers automatically handle auto restarts from CPU hotplug events. Note that this is necessary even for scx_lavd which has CPU hotplug operations as CPU hotplug operations which took place between skel open and scheduler init can still trigger restart.	2024-06-03 12:25:41 -10:00
Tejun Heo	a2d5310cb6	Bump versions for a release	2024-06-03 08:35:21 -10:00
Andrea Righi	4503c7080a	scx_rustland_core: fix build error with musl As reported in #319, we may get a build failure in presence of musl, that requires additional parameters in sched_param. Fix by adding a proper conditional to support both gnu libc and musl libc. This fixes #319. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-26 22:30:04 +02:00
Yiwei Lin	8c2236770d	Reduce MAX_ENQUEUED_TASKS to fit percpu allocator The setting of ops->dispatch_max_batch leads to a too large allocated size for percpu allocator, and it will be unhappy if we want a size larger than PCPU_MIN_UNIT_SIZE. Reduce MAX_ENQUEUED_TASKS for fix.	2024-05-23 22:33:02 +08:00
Andrea Righi	4791d862f5	scx_rustland_core: second chance CPU migration Implement a second-chance migration in select_cpu(): after a task has been dispatched directly do not try to migrate it immediately on a different CPU, but force it to stay on prev_cpu for another round. This seems quite effective on certain architectures (such as on a system with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide noticeable benefits with gaming or WebGL applications (such as https://webglsamples.org/aquarium/aquarium.html) under regular workload conditions (around +5% fps). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 21:38:34 +02:00
Andrea Righi	6c129e9b4b	scx_rustland_core: consume from the shared DSQ before local DSQ The shared DSQ is typically used to prioritize tasks and dispatch them on the first CPU available, so consume from the shared DSQ before the local CPU DSQ. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 12:12:55 +02:00
Andrea Righi	9e4bea4a1c	scx_rustland_core: switch to FIFO when system is underutilized Provide a knob in scx_rustland_core to automatically turn the scheduler into a simple FIFO when the system is underutilized. This choice is based on the assumption that, in the case of system underutilization (less tasks running than the amount of available CPUs), the best scheduling policy is FIFO. With this option enabled the scheduler starts in FIFO mode. If most of the CPUs are busy (nr_running >= num_cpus - 1), the scheduler immediately exits from FIFO mode and starts to apply the logic implemented by the user-space component. Then the scheduler can switch back to FIFO if there are no tasks waiting to be scheduled (evaluated using a moving average). This option can be enabled/disabled by the user-space scheduler using the fifo_sched parameter in BpfScheduler: if set, the BPF component will periodically check for system utilization and switch back and forth to FIFO mode based on that. This allows to improve performance of workloads that are using a small amount of the available CPUs in the system, while still maintaining the same good level of performance for interactive tasks when the system is over commissioned. In certain video games, such as Baldur's Gate 3 or Counter-Strike 2, running in "normal" system conditions, we can experience a boost in fps of approximately 4-8% with this change applied. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 09:02:02 +02:00
Andrea Righi	912bde6a52	scx_rustland_core: simplify CPU selection logic Simplify the CPU idle selection logic relying on the built-in logic. If something can be improved in this logic it should be done in the backend, changing the default idle selection logic, rustland doesn't need to do anything special here for now. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 09:02:02 +02:00
Andrea Righi	0d75c80587	Revert "Merge pull request #305 from sched-ext/rustland-fifo-mode" This merge included additional commits that were supposed to be included in a separate pull request and have nothing to do with the fifo-mode changes. Therefore, revert the whole pull request and create a separate one with the correct list of commits required to implement this feature. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 09:00:25 +02:00
Andrea Righi	f27e67dfb8	scx_rustland_core: second chance CPU migration Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 18:08:58 +02:00
Andrea Righi	778ee1406f	scx_rustland_core: consume from the shared DSQ before local DSQ The shared DSQ is typically used to prioritize tasks and dispatch them on the first CPU available, so consume from the shared DSQ before the local CPU DSQ. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 18:08:19 +02:00
Andrea Righi	d25675ff44	scx_rustland_core: switch to FIFO when system is underutilized Provide a knob in scx_rustland_core to automatically turn the scheduler into a simple FIFO when the system is underutilized. This choice is based on the assumption that, in the case of system underutilization (less tasks running than the amount of available CPUs), the best scheduling policy is FIFO. With this option enabled the scheduler starts in FIFO mode. If most of the CPUs are busy (nr_running >= num_cpus - 1), the scheduler immediately exits from FIFO mode and starts to apply the logic implemented by the user-space component. Then the scheduler can switch back to FIFO if there are no tasks waiting to be scheduled (evaluated using a moving average). This option can be enabled/disabled by the user-space scheduler using the fifo_sched parameter in BpfScheduler: if set, the BPF component will periodically check for system utilization and switch back and forth to FIFO mode based on that. This allows to improve performance of workloads that are using a small amount of the available CPUs in the system, while still maintaining the same good level of performance for interactive tasks when the system is over commissioned. In certain video games, such as Baldur's Gate 3 or Counter-Strike 2, running in "normal" system conditions, we can experience a boost in fps of approximately 4-8% with this change applied. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 17:39:11 +02:00
Andrea Righi	3ad634c293	scx_rustland_core: simplify CPU selection logic Simplify the CPU idle selection logic relying on the built-in logic. If something can be improved in this logic it should be done in the backend, changing the default idle selection logic, rustland doesn't need to do anything special here for now. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 17:08:06 +02:00
David Vernet	ee940bd8b5	rustland: Mark get_cpu_owner() as __maybe_unused scx_rustland has a function called get_cpu_owner() in BPF which currently has no callers. There's nothing wrong with the function, but it causes a warning due to an unused function. Let's just annotate it with __maybe_unused to tell the compiler that it's not a problem. Signed-off-by: David Vernet <void@manifault.com>	2024-05-18 07:51:20 -05:00
Tejun Heo	ab25992416	Add missing skel.attach() calls C SCX_OPS_ATTACH() and rust scx_ops_attach() macros were not calling .attach() and were only attaching the struct_ops. This meant that all non-struct_ops BPF programs contained in the skels were never attached which breaks e.g. scx_layered. Let's fix it by adding .attach() invocation the the attach macros.	2024-05-17 14:33:04 -10:00
David Vernet	c1f1411c7a	Merge pull request #289 from sched-ext/rusty_hot_plug Add remaining hotplug pieces	2024-05-16 13:42:11 -06:00
David Vernet	4edd8d75e6	compat: Add scx_ops_open!() macro In order to make it easy for schedulers to use the hotplug_seq feature that's available in recent kernels, we'll need to provide a macro wrapper so that we can support the feature with backwards compatibility. This adds scx_ops_open!() to abstract that. Any scheduler that uses scx_ops_open()! will be exited if a hotplug event happens between opening the skeleton, and loading it. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 16:42:57 -05:00
David Vernet	34818de54d	rusty: Use built-in exit code for restarting Now that the kernel exports the SCX_ECODE_ACT_RESTART exit code, we can remove the custom hotplug logic from scx_rusty, and instead rely on the built-in logic from the kernel. There's still a corner case that we're not honoring: when a hotplug event happens on the init path. A future change will address this as well. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 16:31:56 -05:00
Andrea Righi	e9ac6105c7	scx_rustland_core: introduce low-power mode Introduce a low-power mode to force the scheduler to operate in a very non-work conserving way, causing a significant saving in terms of power consumption, while still providing a good level of responsiveness in the system. This option can be enabled in scx_rustland via the --low_power / -l option. The idea is to not immediately re-kick a CPU when it enters an idle state, but do that only if there are no other tasks running in the system. In this way, latency-critical tasks can be still dispatched immediately on the other active CPUs, while CPU-bound tasks will be forced to spend more time waiting to be scheduled, basically enforcing a special CPU throttling mechanism that affects only the tasks that are not latency critical. The consequence is a reduction in the overall system throughput, but also a significant reduction of power consumption, that can be useful for mobile / battery-powered devices. Test case (using `scx_rustland -l`): - play a video game (Terraria) while recompiling the kernel - measure game performance (fps) and core power consumption (W) - compare the result of normal mode vs low-power mode Result: Game performance \| Power consumption \| ------------+-----------------+-------------------+ normal mode \| 60 fps \| 6W \| low-power mode \| 60 fps \| 3W \| As we can see from the result the reduction of power consumption is quite significant (50%), while the responsiveness of the game (fps) remains the same, that means battery life can be potentially doubled without significantly affecting system responsiveness. The overall throughput of the system is, of course, affected in a negative way (kernel build is approximately 50% slower during this test), but the goal here is to save power while still maintaining a good level of responsiveness in the system. For this reason the low-power mode should be considered only in emergency conditions, for example when the system is close to completely run out of power or simply to extend the battery life of a mobile device without compromising its responsiveness. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-15 20:32:05 +02:00
David Vernet	c6cc59e8b4	uei: Expose exit code enums from user_exit_info.rs Schedulers and the kernel can include an exit code when exiting a scheduler. There are some built-in codes that can be specified: SCX_ECODE_RSN_HOTPLUG, and SCX_ECODE_ACT_RESTART. Some schedulers may want to check the exit code against these values, so let's export them from user_exit_info.rs. We use lazy_static so that we can read the values for the enum for the currently-running kernel. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 13:31:07 -05:00
Andrea Righi	45a9b178cd	scx_rustland_core: expose nr_running metric Expose the new metric nr_running to keep track of the amount of currently running tasks. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-15 20:28:10 +02:00
Andrea Righi	5f9ce3bba6	Merge pull request #272 from sched-ext/rustland-reduce-scheduling-overhead rustland: reduce scheduling overhead	2024-05-11 10:17:01 +02:00
Andrea Righi	209c454149	scx_rustland_core: fix update_idle description The comment that describes rustland_update_idle() is still incorrectly reporting an old implemention detail. Update its description for better clarity. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-11 07:37:33 +02:00
Andrea Righi	311b7f861c	scx_rustland_core: refine built-in CPU idle selection logic Change the BPF CPU selection logic as following: - if the previously used CPU is idle, keep using it - if the task is not coming from a wait state, try to stick as much as possible to the same CPU (for better cache usage) - if the task is waking up from a wait state rely on the sched_ext built-int idle selection logic This logic can be completely disabled when the full user-space mode is enabled. In this case tasks will always be assigned to the previously used CPU and the user-space scheduler should take care of distributing them among the available CPUs. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-11 07:37:31 +02:00
David Vernet	904a89117c	topology: Support CONFIG_NUMA=n in Topology crate Some users are running with NUMA disabled, which makes sense given that it's useless in a lot of contexts. Let's make the Topology crate assume a default node with ID 0 in such cases. Signed-off-by: David Vernet <void@manifault.com>	2024-05-10 16:46:15 -05:00
Andrea Righi	63feba9c2b	topology: TopologyMap: add nr_cpus_online() Add a method to TopologyMap to get the amount of online CPUs. Considering that most of the schedulers are not handling CPU hotplugging it can be useful to expose also this metric in addition to the amount of available CPUs in the system. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 17:24:20 +02:00
Andrea Righi	f052493005	scx_rustland_core: implement effective time slice on a per-task basis Drop the global effective time-slice and use the more fine-grained per-task time-slice to implement the dynamic time-slice capability. This allows to reduce the scheduler's overhead (dropping the global time slice volatile variable shared between user-space and BPF) and it provides a more fine-grained control on the per-task time slice. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 17:24:20 +02:00
Andrea Righi	887812197c	scx_utils: report an explicit error when another scheduler is running If another scheduler is already running, the Rust schedulers based on scx_utils are reporting an error like the following, that can be a bit difficult to understand: Error: Failed to attach struct ops Caused by: bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL Change the scx_ops_attach macro to check if another sched_ext scheduler is running and in that case report a more explicit error. With this applied: $ sudo scx_rustland Error: another sched_ext scheduler is already running Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 11:20:19 +02:00
Andrea Righi	5da4602ad7	scx_rustland_core: use a BPF_MAP_TYPE_USER_RINGBUF to dispatch tasks Replace the BPF_MAP_TYPE_QUEUE with a BPF_MAP_TYPE_USER_RINGBUF to store the tasks dispatched from the user-space scheduler to the BPF component. This eliminates the need of the bpf() syscalls, significantly reducing the overhead of the user-space->kernel communication and delivering a notable performance boost in the overall system throughput. Based on experimental results, this change allows to reduces the scheduling overhead by approximately 30-35% when the system is overcommitted. This improvement has the potential to make user-space schedulers based on scx_rustland_core viable options for real production systems. Link: https://github.com/libbpf/libbpf-rs/pull/776 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-08 22:16:53 +02:00
Andrea Righi	fd68ce13a7	scx_rustland_core: bump up version to 0.4.0 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-30 18:09:09 +02:00
Andrea Righi	be415d4c06	scx_rustland_core: relax compact unevictable memory constraint In order to prevent deadlock conditions user-space schedulers need to perform memory allocations from a pool of pre-allocated memory that will never be unmapped/reclaimed by the kernel (unevictable memory). However, there is a special kernel sysctl setting (vm.compact_unevictable_allowed) that allows kcompactd to reclaim unevictable memory. This behavior should be prevented by setting vm.compact_unevictable_allowed = 0, that is what scx_rustland_core does transparently when a scheduler is started, restoring the previous value when the scheduler is stopped. Unfortunately, this is not always doable, especially when running a scheduler inside a containerized environment or under certain security/privilege restrictions (e.g., AppArmor confinement). Therefore, just report a WARNING if we are unable to change this parameter, instead of considering it a hard-failure, to allow running scx_rustland_core schedulers also inside such limited environments. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-30 18:09:09 +02:00
Tejun Heo	cf66e58118	Sync from kernel (670bdab6073) And fix build breakage in scx_utils due to an enum type rename.	2024-04-29 09:58:19 -10:00
Andrea Righi	cabde30736	scx_utils: bump up version to 0.8.0 Bump up scx-utils version to provide the new scx_utils::TopologyMap. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-28 21:01:16 +02:00
Andrea Righi	fb2f5c240e	scx_rustland_core: bump up version to 0.3 Given that rustland_core now supports task preemption and it has been tested successfully, it's worhtwhile to cut a new version of the crate. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-28 12:01:38 +02:00
Andrea Righi	cdf3729355	scx_rustland_core: make crate version accessible Export the version of scx_rustland_core to the users of the crate. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-28 12:01:38 +02:00
Andrea Righi	973aded5a8	Merge pull request #238 from sched-ext/rustland-reduce-topology-overhead scx_rustland: reduce overhead by caching host topology	2024-04-24 22:24:23 +02:00
David Vernet	46defa073d	compat: Add kfunc_exists() compat helper func It would be useful to be able to check whether a kfunc exists from rust user space. For example, now that we have support for adjusting cpufreq in scx_layered, we'll want to be able to test whether or not a the scx_bpf_cpuperf_set() (and friends) kfuncs are present for backwards compat purposes. Let's add a kfunc_exists() function to compat.rs for this purpose. Signed-off-by: David Vernet <void@manifault.com>	2024-04-24 13:49:31 -05:00
Andrea Righi	d7bb5a7cba	topology: Add TopologyMap Introuce a TopologyMap object, represented as an array of arrays, where each inner array corresponds to a core containing its associated CPU IDs. This object can be used as a cache to facilitate efficient iteration over the entire host's topology. Example usage: let topo = Topology::new()?; let topo_map = TopologyMap::new(topo)?; for (core_id, core) in topo_map.iter().enumerate() { for cpu in core { println!("core={} cpu={}", core_id, cpu); } } Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-24 17:08:06 +02:00
David Vernet	c187c65702	topology: Don't allocate on calls to span() We're currently cloning cpumasks returned by calls to {Core, Cache, Node, Topology}::span(). If a caller needs to clone it, they can. Let's not penalize the callers that just want to query the underlying cpumask. Signed-off-by: David Vernet <void@manifault.com>	2024-04-23 22:59:42 -05:00
Andrea Righi	f02e9b072c	scx_rustland_core: use a separate field to store dispatch flags Do not encode dispatch flags in the cpu field, but simply use a separate "flags" field. This makes the code much simpler and it increases the size of dispatched_task_ctx from 24 to 32, that is probably better in terms of cacheline allocation / performance. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-23 16:10:56 +02:00
Andrea Righi	27c1f9c329	scx_rustland_core: introduce preemption Introduce the new dispatch flag RL_PREEMPT_CPU that can be used to dispatch tasks that can preempt others. Tasks with this flag set will be dispatched by the BPF part using SCX_ENQ_PREEMPT, so they can potentially preempt any other task running on the target CPU. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-23 07:13:30 +02:00
Andrea Righi	6d2aac1591	scx_rustland_core: introduce dispatch flags Reserve some bits of the `cpu` attribute of a task to store special dispatch flags. Initially, let's introduce just RL_CPU_ANY to replace the special value NO_CPU, indicating that the task can be dispatched on any CPU, specifically the first CPU that becomes available. This allows to keep the CPU value assigned by the builtin idle selection logic, that can potentially be used later for further optimizations. Moreover, having the possibility to specify dispatch flags gives more flexibility and it allows to map new scheduling features to such flags. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-23 07:13:30 +02:00
David Vernet	9a8ed8ab44	Merge pull request #218 from sched-ext/rusty_hotplug Gracefully handle hotplug in scx_rusty	2024-04-04 16:03:59 -05:00
Andrea Righi	17a30bddc9	scx_rustland_core: bump up version to 0.2 Bump up the version of the crate and update dependencies. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 22:44:55 +02:00
David Vernet	052ce428a3	uei: Export exit_code from UserExitInfo Newer kernels also support exiting gracefully with an exit code. Let's update the UserExitInfo struct to also read and export this value. Signed-off-by: David Vernet <void@manifault.com>	2024-04-04 14:52:45 -05:00
Andrea Righi	85a32a7b51	scx_rustland_core: separate crate source code from assets scx_rustland_core needs to ship both a binary part and a source code part, which will be used to build schedulers based on it. To effectively publish the scx_rustland_core crate on crates.io we need to properly separate the source code assets from the crate's main source code. To achieve this, move the assets into a separate directory and declare them inside a [lib] section in Cargo.toml. This allows to publish the crate on crates.io, providing also a clear separation between source code and assets. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 21:31:02 +02:00
Andrea Righi	fbccb161a3	scx_rustland_core: re-introduce consume_raw() libbpf-rs API Now that libbpf-rs 0.23 has been officially released with the new consume_raw() API (https://github.com/libbpf/libbpf-rs/pull/680) we can re-introduce the change in rustland-core that allows to use this API to improve the quality of the code and make it slightly more efficient when consuming tasks from BPF to user-space. Fixes: `bd2c18a` ("Revert "scx_rustland_core: use new consume_raw() libbpf-rs API"") Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 17:34:57 +02:00
Tejun Heo	b925bdf94d	Cargo.toml: Update libbpf-rs/cargo dependencies to 0.23 and drop patch.crates-io sections New versions of libbpf-rs and libbpf-cargo are now available with all the needed features. Update the dependencies and drop the patch sections.	2024-04-02 11:19:39 -10:00
Tejun Heo	6f81409df4	Bump versions - scx_utils bumped from 0.6.0 to 0.7.0. - Repo and rust schedulers get a PATCH level bump.	2024-04-02 10:58:50 -10:00
Tejun Heo	b7b402836d	scx_utils: Sync libbpf SHA1s with the rest	2024-04-02 10:35:21 -10:00
Tejun Heo	f3e20ae9b3	scx_rustland: Apply API updates and add --exit-dump-len option to scx_rustland	2024-04-02 10:30:56 -10:00
Tejun Heo	59bbd800c1	compat: Implement scx_utils::compat and fix up scx_layered Implement scx_utils::compat to match C's scx/compat.h and update scx_layered. Other rust scheds are still broken.	2024-04-02 07:08:56 -10:00
David Vernet	1bd990fb87	topology: Fall back to cache 0 As described in https://github.com/sched-ext/scx/issues/195, apparently some chips don't export information about their cache topology. There's not much we can do if we don't have that information, so let's just assume a unified cache per node if that happens. Andrea suggested this patch -- I'm applying exactly what he proposed, with a slightly modified comment. Suggested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David Vernet <void@manifault.com>	2024-03-21 11:10:55 -05:00
David Vernet	3ad0fff855	Merge pull request #188 from sched-ext/topology-fix-single-cpu topology: support single CPU systems	2024-03-14 13:16:59 -05:00
David Vernet	4520514fe8	rusty: Account for disabled but offline CPUs As described in https://bugzilla.kernel.org/show_bug.cgi?id=218109, https://github.com/sched-ext/scx/issues/147 and https://github.com/sched-ext/sched_ext/issues/69, AMD chips can sometimes report fully disabled CPUs as offline, which causes us to count them when looking at /sys/devices/system/cpu/possible. Additionally, systems can have holes in their active CPU maps. For example, a system with CPUs 0, 1, 2, 3 possible, may have only 0 and 2 active. To address this, we need to do a few things: 1. Update topology.rs to be clear that it's returning the number of _possible_ CPUs in the system. Also update Topology to only record online CPUs when creating its span and iterating over sysfs when creating domains. It was previously trying to record when a CPU was online, but this was actually broken as the topology directory isn't present in sysfs when the CPU is offline. 2. Schedulers should not be relying on nr_possible_cpus for anything other than interacting with per-CPU data (e.g. for stats extraction), or e.g. verifying maximum sizes of statically sized arrays in BPF. It should _not_ be used for e.g. performing load calculations, etc. With that said, we'll also need to update schedulers to not rely on the nr_possible_cpus figure being exported by the topology crate. We do that for rusty in this patch, but don't fix any of the others other than updating how they call topology.rs. 3. Account for the fact that LLC IDs may be non-contiguous. For example, if there is a single core in an LLC, then if we assign LLC IDs to domains, then the domain IDs won't be contiguous. This doesn't fit our current model which is used by e.g. infeasible_weights.rs. We'll update some of the code in rusty to accomodate this, but we'll need to do more. 4. Update schedulers to properly reset themselves in the event of a hotplug event. We'll take care of that in a follow-on change. Signed-off-by: David Vernet <void@manifault.com>	2024-03-14 11:15:28 -05:00
David Vernet	bc0336d727	cpumask: Add bitwise ops for cpumask We implement functions or(), and(), and xor() for cpumasks, but we should also implement the bitwise ops for those operations in case people prefer that syntax. Signed-off-by: David Vernet <void@manifault.com>	2024-03-14 11:02:01 -05:00
David Vernet	84a202e2a0	topology: Skip offline CPUs Offline CPUs don't have a /sys/devices/system/cpu/cpuN/topology directory, so let's just skip them if they're not online. Schedulers are expected to detect hotplug, and handle gracefully restarting. Signed-off-by: David Vernet <void@manifault.com>	2024-03-14 11:02:01 -05:00
David Vernet	583696f940	topology: Include last CPU in online We're iterating from min..max cpu in cpus_online(), but that's not inclusive of the max CPU. Let's also include that so we don't think that last CPU is offline. Signed-off-by: David Vernet <void@manifault.com>	2024-03-14 11:01:52 -05:00
Andrea Righi	2cd3929475	scx_rustland: mitigate sub-optimal performance with offline CPUs Most of the schedulers assume that the amount of possible CPUs in the system represents the actual number of CPUs available. This is not always true: some CPUs may be offline or certain CPU models (AMD CPUs for example) may include unavailable CPUs in this number. This can lead to sub-optimal performance or even errors in the scheduler (see for example [1][2]). Ideally, we need to attack this issue in a more generic way, such as having a proper API provided by a C library, that can be used by all schedulers and the topology Rust module (scx_utils crate). But for now, let's try to mitigate most of the common sub-optimal cases separately inside each scheduler. For rustland we can apply some mitigations both in select_cpu() (for the BPF part) and in the user-space part: - the former is fixed in the sched-ext kernel by commit 94dc0c01b957 ("scx: Use cpu_online_mask when resetting idle masks"). However, adding an extra check `cpu < num_possible_cpus` in select_cpu(), allows to properly support AMD CPUs, even with kernels that don't have the cpu_online_mask fix yet (this doesn't always guarantee the validity of cpu, but it should be enough to mitigate the majority of the potential sub-optimal cases, without introducing any significant overhead) - the latter can be fixed relying on topology.span(), instead of topology.nr_cpus(), to count the amount of available CPUs in the system. [1] https://github.com/sched-ext/sched_ext/issues/69 [2] https://github.com/sched-ext/scx/issues/147 Link: `94dc0c01b9` Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-03-14 10:19:31 +01:00
Andrea Righi	a1b05c5ab3	topology: support single CPU systems We are failing to parse /sys/devices/system/cpu/online in systems with just one CPU, for example: $ vng -r --cpus 1 -- scx_rusty Error: Failed to parse online cpus 0 Correctly handle strings containing only a single CPU during parsing. Fixes: `c5a3b83b` ("topology: Add new topology crate") Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-03-14 07:46:20 +01:00
David Vernet	91cb5ce8ab	Merge pull request #178 from sched-ext/multi_numa_rusty rusty: Implement NUMA-aware load balancing	2024-03-12 15:50:27 -05:00
Jordan Rome	ffc7b7dc4a	Fetch and build bpftool by default This pairs with the new default behavior to fetch and build libbpf and is mostly being used so we can use the latest bpftool and libbpf.	2024-03-11 10:00:01 -07:00
David Vernet	1c3168d2a4	topology: Don't assume unique core IDs The current topology.rs crate assumes that all cores have unique core IDs in a system. This need not be the case, such as in certain Intel Xeon processors which reuse core IDs in different NUMA nodes. Let's update the crate to assume unique core IDs only per socket. Signed-off-by: David Vernet <void@manifault.com>	2024-03-08 15:13:46 -06:00
David Vernet	12e0586fe9	cpumask: Update cpumask fmt function The cpumask print formatter doesn't look great in its current form, which uses the BitVec formatter under the hood: [INFO] NUMA[00 32:<[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]>] [INFO] DOM[00] 32:<[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]> [INFO] DOM[01] 32:<[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]> Let's just iterate over the mask and manually format the string using the binary formatter over the slice of u64's, which renders like this: [INFO] NUMA[00] 0b11111111111111111111111111111111] [INFO] DOM[00] 0b00000000111111110000000011111111 [INFO] DOM[01] 0b11111111000000001111111100000000 Signed-off-by: David Vernet <void@manifault.com>	2024-03-08 15:11:17 -06:00
Andrea Righi	5cf113f058	scx_rustland_core: provide DispatchedTask API methods Provide distinct methods to set the target CPU and the per-task time slice to dispatched tasks. Moreover, also provide a constructor to create a DispatchedTask from a QueuedTask (this allows to automatically bounce a task from the scheduler to the BPF dispatcher without having to take care of setting the individual task's attributes). This also allows to make most of the attributes of DispatchedTask private, especially it allows to hide cpumask_cnt, that should be only used internally between the BPF and the user-space component. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-03-03 15:49:37 +01:00
Andrea Righi	e10f8a2d8e	scx_rustland_core: introduce per-task time slice Provide a way to set a different time slice per-task, by adding a new attribute slice_ns to the DispatchedTask struct. This attribute determines the time slice assigned to the task, if it is set to 0 then the global time slice (either the default one or the effective one, if set) will be used. At the same time, remove the payload attribute, that is basically unused (scx_rustland uses it to send the task's vruntime to the BPF dispatcher for debugging purposes, but it's not very useful anymore at this point). In the future we may introduce a proper interface to attach a custom payload to each task with a proper interface. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-03-03 15:06:56 +01:00
Jordan Rome	499924ead8	Add libbpf as a submodule This is to potentinally reduce issues with folks using different versions of libbpf at runtime. This also: - makes static linking of libbpf the default - adds steps in `meson setup` to fetch libbpf and make it	2024-03-01 12:39:35 -08:00
Andrea Righi	0d1c6555a4	scx_rustland_core: generate source files in-tree There is no need to generate source code in a temporary directory with RustLandBuilder(), we can simply generate code in-tree and exclude the generated source files from .gitignore. Having the generated source files in-tree can help to debug potential build issues (and it also allows to drop the the tempfile crate dependency). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
Andrea Righi	06d8170f9f	scx_utils: introduce Builder() Introduce a Builder() class in scx_utils that can be used by other scx crates (such as scx_rustland_core) to prevent code duplication. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
Andrea Righi	2ac1a5924f	scx_rustland_core: introduce RustLandBuilder() Introduce a wrapper to scx_utils::BpfBuilder that can be used to build the BPF component provided by scx_rustland_core. The source of the BPF components (main.bpf.c) is included in the crate as an array of bytes, the content is then unpacked in a temporary file to perform the build. The RustLandBuilder() helper is also used to generate bpf.rs (that implements the low-level user-space Rust connector to the BPF commponent). Schedulers based on scx_rustland_core can simply use RustLandBuilder(), to build the backend provided by scx_rustland_core. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
Andrea Righi	e23426e299	scx_rustland_core: introduce method bpf.update_tasks() Introduce a helper function to update the counter of queued and scheduled tasks (used to notify the BPF component if the user-space scheduler has still some pending work to do). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
Andrea Righi	00e25530bc	scx_rlfifo: simple user-space FIFO scheduler written in Rust Implement a FIFO scheduler as an example usage of scx_rustland_core. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
Andrea Righi	871a6c10f9	scx_rustland_core: include scx_rustland backend Move the BPF component of scx_rustland to scx_rustland_core and make it available to other user-space schedulers. NOTE: main.bpf.c and bpf.rs are not pre-compiled in the scx_rustland_core crate, they need to be included in the user-space scheduler's source code in order to be compiled/linked properly. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
Andrea Righi	416d6a940f	rust: introduce scx_rustland_core crate Introduce a separate crate (scx_rustland_core) that can be used to implement sched-ext schedulers in Rust that run in user-space. This commit only provides the basic layout for the new crate and the abstraction to the custom allocator. In general, any scheduler that has a user-space component needs to use the custom allocator to prevent potential deadlock conditions, caused by page faults (a kthread needs to run to resolve the page fault, but the scheduler is blocked waiting for the user-space page fault to be resolved => deadlock). However, we don't want to necessarily enforce this constraint to all the existing Rust schedulers, some of them may do all user-space allocations in safe paths, hence the separate scx_rustland_core crate. Merging this code in scx_utils would force all the Rust schedulers to use the custom allocator. In a future commit the scx_rustland backend will be moved to scx_rustland_core, making it a totally generic BPF scheduler framework that can be used to implement user-space schedulers in Rust. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-28 17:49:44 +01:00
David Vernet	b1dc37889f	infeasible: Add a new infeasible crate for load balancing We want to avoid every scheduler implementation from having to implement the solution to the infeasible weights problem, but we also want to enable sufficient flexibility where not every program has to have the same partition of scheduling domains, etc. To enable this, a new infeasible crate is added which encapsulates all of the logic for being given duty cycle and weight, and performing the necessary math to adjust for infeasibility. Signed-off-by: David Vernet <void@manifault.com>	2024-02-26 10:51:52 -06:00
David Vernet	f85cde7038	topology: Add maps to cores and cpus from root Topology object For convenience, let's provide callers with a way to easily look up cores and CPUs from the root topology object. Signed-off-by: David Vernet <void@manifault.com>	2024-02-23 13:09:02 -06:00
David Vernet	43624a87ce	rusty: Use new topology crate Now that we have this new Topology crate, let's update Rusty to use it instead of using the old one. Signed-off-by: David Vernet <void@manifault.com>	2024-02-23 10:39:55 -06:00
David Vernet	c5a3b83bbd	topology: Add new topology crate The topology.rs crate is insufficiently generic, and reflects implementation details of scx_rusty more than it provides generic use cases for modeling a host's topology. This adds a new topology2.rs crate that will replace topology.rs. We have this as an intermediate commit so that we don't bundle updating scx_rusty with adding this crate. Signed-off-by: David Vernet <void@manifault.com>	2024-02-23 10:39:00 -06:00
David Vernet	a2f531e429	topology: Refactor topology.rs to use cpumask crate Now that we have cpumask.rs, we can remove some logic from topology.rs and have it create and use Cpumasks. Signed-off-by: David Vernet <void@manifault.com>	2024-02-22 17:25:44 -06:00
David Vernet	bd15eb8e41	cpumask: Add new Cpumask crate Let's add a Cpumask trait that schedulers can use to avoid all having to deal directly with BitVec and the like. Signed-off-by: David Vernet <void@manifault.com>	2024-02-22 17:24:26 -06:00
David Vernet	49065de8df	scx: Demote panic! to warn! in topology crate We currently panic! if we're building a Topology that detects more than two siblings on a physical core. This can and will likely happen on multi-socket machines. Given that we're planning to add support for detecting NUMA nodes soon, let's just demote the panic! to a warn!. Signed-off-by: David Vernet <void@manifault.com>	2024-02-20 21:01:17 -06:00
Jordan Rome	7c32acece0	Add libbpf logging to the rust schedulers This is to get better logs when failing to load, attach, etc.	2024-02-20 15:17:10 -08:00
David Vernet	ef8aa9ea31	add documentation Signed-off-by: David Vernet <void@manifault.com>	2024-02-20 14:57:09 -06:00
David Vernet	8aba090d4f	rust: Add topology module to utils crate scx_rusty has logic in the scheduler to inspect the host to automatically build scheduling domains across every L3 cache. This would be generically useful for many different types of schedulers, so let's add it to the scx_utils crate so it can be used by others. Signed-off-by: David Vernet <void@manifault.com>	2024-02-20 14:57:09 -06:00
Andrea Righi	f5a21198ad	scx_utils: use c_char to prevent build failures Use c_char to convert C strings, that is more portable across different architectures. This prevents a build failure on arm64 and ppc64el. Fixes: `d57a23f` ("rust/scx_utils: Add user_exit_info support") Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-02-11 21:42:52 +01:00
Tejun Heo	d57a23f481	rust/scx_utils: Add user_exit_info support So that rust scheds can use the same unified exit handling as C scheds. Signed-off-by: Tejun Heo <tj@kernel.org>	2024-01-31 10:54:15 -10:00
Tejun Heo	988b7d13c1	Bump versions scx_exit_info change doesn't require code to be updated but breaks binary compatbility. Bump versions and cut a new release.	2024-01-25 09:01:23 -10:00
Tejun Heo	942b0269b8	Bump versions After updates to reflect the updated init and direct dispatch API, the schedulers aren't compatible with older kernels. Bump versions and publish releases.	2024-01-08 18:49:54 -10:00
Jordan Rome	6caf6c5c99	Add new archs for bpf_builder This is to fix fedora build failures for these archs: s390x and ppc64le Error: ``` ---- bpf_builder::tests::test_bpf_builder_new stdout ---- thread 'bpf_builder::tests::test_bpf_builder_new' panicked at src/bpf_builder.rs:592:9: Failed to create BpfBuilder (Err(CPU arch "s390x" not found in ARCH_MAP)) ``` https://koji.fedoraproject.org/koji/taskinfo?taskID=111114326	2024-01-03 10:50:33 -08:00
Tejun Heo	98773131df	Bump versions to publish scx_utils fedora compat change	2023-12-29 06:58:45 +09:00
Tejun Heo	c47a4b6716	scx_utils: Explain what's going on with bindgen version and suppress deprecation warning This is a followup to https://github.com/sched-ext/scx/pull/50. See the comment in BpfBuilder::bindgen_bpf_intf() for details.	2023-12-29 06:56:07 +09:00
Jordan Rome	c8a721b033	Downgrade bindgen to 0.68 This is so we can package scx_utils into fedora without having to upgrade rust-bindgen (https://bodhi.fedoraproject.org/updates/FEDORA-2023-18e7f124e1). To make this happen we need to stop using the `CargoCallbacks::new` constructor which was added in 0.69. Old way seems legit according to the docs: https://rust-lang.github.io/rust-bindgen/non-system-libraries.html	2023-12-27 12:19:28 -08:00
Jordan Rome	e9a9d32ab6	Restructure scheds folder names - combine c and kernel-examples as it's confusing to have both - rename 'rust-user' and 'c-user' to just 'rust' and 'c', which is simpler - update and fix sync-to-kernel.sh	2023-12-17 13:14:31 -08:00
Daniel Müller	fed1dae9da	rust: Update libbpf-rs & libbpf-cargo to 0.22 This is a follow on to #32, which got reverted. I wrongly assumed that scx_rusty resides in the sched_ext tree and consumes published version of scx_utils. With this change we update the other in-tree dependencies. I built scx_layered & scx_rusty. I bumped scx-utils to 0.4, because the libbpf-cargo seems to be part of the public API surface and libbpf-cargo 0.21 and 0.22 are not compatible with each other. Signed-off-by: Daniel Müller <deso@posteo.net>	2023-12-14 14:33:58 -08:00
Tejun Heo	995520e415	Revert "scx_utils: Update libbpf-cargo to 0.22" This reverts commit `6d8ba3e3d7`. Piotr Gorski reports build breakage after the commit: https://paste.cachyos.org/p/aa14709.txt	2023-12-14 10:32:25 -10:00
Daniel Müller	6d8ba3e3d7	scx_utils: Update libbpf-cargo to 0.22 It's the latest and greatest. Given the repository setup, this is likely a breaking change from a semver perspective.	2023-12-14 11:18:27 -08:00
Tejun Heo	8a07bcc31b	Bump versions and add LICENSE symlinks for scx_layered and scx_rusty	2023-12-12 11:21:08 -10:00
Davide Cavalca	21e468a491	rust: clarify license and include text	2023-12-12 13:02:13 -08:00
Tejun Heo	3acdef8f2b	scx_utils: Bump version to 0.3.2 - Build fix for ubuntu clang.	2023-12-07 13:08:21 -10:00
Andrea Righi	7ceccf6516	scx_utils::BpfBuilder: properly detect clang version in Ubuntu Apply the same logic of commit `00cd15a` ("build: properly detect clang version in Ubuntu") in scx_utils as well. This allows to build scx_utils properly in Ubuntu. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-07 23:26:21 +01:00
Tejun Heo	fa532a87f5	scx_utils: Bump version & publish So that it doesn't cause the missing lifetime parameter warnings everywhere.	2023-12-05 14:58:51 -10:00
Tejun Heo	5422584baa	scx_utils::BpfBuilder: Add 'static to BPF_H_TAR definition Newer rustc is unhappy about lack of explicit lifetime parameter.	2023-12-05 14:15:06 -10:00
Tejun Heo	fcab460386	scx_utils: Bump version and publish include directory structure has changed (a breaking change) and the doc had a misleading error. Let's cut a new release.	2023-12-03 12:51:16 -10:00
Tejun Heo	d0ed7913b4	scheds: Rearrange include files to match kernel/tools/sched_ext/include Build scripts are updated accordingly.	2023-12-03 12:47:23 -10:00
Tejun Heo	44b811831a	doc: README.md and OVERVIEW.md added and other minor updates	2023-12-03 11:48:55 -10:00
Tejun Heo	12a90335b3	scx_utils: Bump version to 0.2.1	2023-12-02 23:56:51 -10:00
Tejun Heo	b38f7574ac	scx_utils: Documentation and other minor updates	2023-12-02 23:56:36 -10:00
Tejun Heo	41a4f6407e	build: Add dummy gnu/stubs.h to fix build on systems without 32bit glibc-dev See comment in sched/include/bpf-compat/gnu/stubs.h for details.	2023-12-02 13:25:02 -10:00
Tejun Heo	2af20dead9	scx_utils::BpfBuilder: Add '-target bpf' to clang_args when calling bindgen bpf_cflags doesn't contain '-target bpf' because SkeletonBuilder does so internally. However, bindgen doesn't and we end up specifying options which are specific to bpf target without actually specifying the target, this leads to warnings like the followings: [scx_layered 0.0.1] warning: unknown warning option '-Wno-compare-distinct-pointer-types''; did you mean '-Wno-compare-distinct-pointer-types'? [-Wunknown-warning-option] [scx_layered 0.0.1] clang diag: warning: argument unused during compilation: '-mcpu=v3' [-Wunused-command-line-argument] [scx_layered 0.0.1] clang diag: warning: unknown warning option '-Wno-compare-distinct-pointer-types''; did you mean '-Wno-compare-distinct-pointer-types'? [-Wunknown-warning-option] Fix it by adding '-target bpf' when invoking bindgen.	2023-12-02 07:23:27 -10:00
Tejun Heo	0f1ed894bd	build: rust projects now link against libbpf.a if provided	2023-12-02 06:41:26 -10:00
Tejun Heo	30f6a38573	build: Pipe down build options to scx_utils::BuildHelpers using env vars	2023-12-01 14:49:32 -10:00
Tejun Heo	a55fc6893b	build: Trigger cargo build on rust sub-projects from meson.build	2023-12-01 11:58:56 -10:00
Tejun Heo	6ec509b3b6	build, scx_utils: Misc improvements - build: Check clang version like scx_utils does. - scx_utils: Generate rerun-if-env-changed directives.	2023-12-01 10:20:06 -10:00
Tejun Heo	01d8351616	rustu: Import scx_rusty and scx_layered from kernel tree	2023-11-30 13:13:41 -10:00
Tejun Heo	30b40ac4d3	bpf_builder: It can now compile scx_rusty and scx_layered both within and outside kernel tree	2023-11-30 13:05:29 -10:00
Tejun Heo	5e93a0f8ff	scx_utils: bpf_builder: Rename and restructure	2023-11-29 21:07:33 -10:00
Tejun Heo	93f571f64b	scx_utils: Merge bpf_h.rs into build_helpers.rs	2023-11-29 10:37:19 -10:00
Tejun Heo	e1600a70ba	scx_utils: Add build_helpers	2023-11-29 08:22:54 -10:00
Tejun Heo	f032cfc679	scx_utils: Add mod bpf_h	2023-11-28 15:20:01 -10:00
Tejun Heo	03aae04f1a	scx_utils: Initial repo creation with ravg_read	2023-11-28 08:48:02 -10:00

1 2 3 4 5

226 Commits