scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-25 12:10:24 +00:00

Author	SHA1	Message	Date
Andrea Righi	aae4ed5b46	scx_rustland: fix coding style Small coding style changes found by rustfmt (no functional change). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-20 05:20:46 +02:00
Andrea Righi	b1ab9c7418	scx_rustland: get rid of the dynamic slice boost The dynamic slice boost is not used anymore in the code, so there is no reason to keep evaluating it. Moreover, using it instead of the static slice boost seems to make things worse, so let's just get rid of it. Fixes: `0b3c399` ("scx_rustland: introduce dynamic slice boost") Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-19 07:51:26 +02:00
David Vernet	17c0c10b4e	Merge pull request #294 from sched-ext/fix_warnings Fix warnings	2024-05-18 10:47:54 -05:00
Changwoo Min	4cba06dc33	scx_lavd: fix inconsistent indentation in main.bpf.c Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-18 22:22:16 +09:00
David Vernet	a1c60ce589	lavd: Remove unused variables from scx_lavd Fix unused variable warnings. Signed-off-by: David Vernet <void@manifault.com>	2024-05-18 07:51:20 -05:00
David Vernet	ee940bd8b5	rustland: Mark get_cpu_owner() as __maybe_unused scx_rustland has a function called get_cpu_owner() in BPF which currently has no callers. There's nothing wrong with the function, but it causes a warning due to an unused function. Let's just annotate it with __maybe_unused to tell the compiler that it's not a problem. Signed-off-by: David Vernet <void@manifault.com>	2024-05-18 07:51:20 -05:00
David Vernet	df42589a76	rusty: Fix bugs in rusty When building with warnings enabled, a few obvious bugs are pointed out: - We're not correctly calculating waker frequency - We're not taking the min of avg_run_raw compared to max latency - We're missing an element from sched_prio_to_weight Fix these. With these changes, interactivity is seemingly improved. We go from ~12 sec / turn -> 11 seconds / turn in the Civ 6 AI benchmark with a 4 x nproc CPU hogging workload in the background. It's clear, however, that we really need preemption. Signed-off-by: David Vernet <void@manifault.com>	2024-05-18 07:51:20 -05:00
David Vernet	61cbfdf912	layered: Remove unused variables There are some unused variables in scx_layered. Remove them. Signed-off-by: David Vernet <void@manifault.com>	2024-05-18 07:51:20 -05:00
David Vernet	b421cee59e	Merge pull request #291 from sched-ext/htejun/sync-kernel Sync from kernel (73f4013eb1eb)	2024-05-17 20:43:00 -05:00
Tejun Heo	ab25992416	Add missing skel.attach() calls C SCX_OPS_ATTACH() and rust scx_ops_attach() macros were not calling .attach() and were only attaching the struct_ops. This meant that all non-struct_ops BPF programs contained in the skels were never attached which breaks e.g. scx_layered. Let's fix it by adding .attach() invocation the the attach macros.	2024-05-17 14:33:04 -10:00
Tejun Heo	e26fba9255	Sync from kernel (73f4013eb1eb) This pulls in the support for dump ops.	2024-05-17 01:57:36 -10:00
David Vernet	c1f1411c7a	Merge pull request #289 from sched-ext/rusty_hot_plug Add remaining hotplug pieces	2024-05-16 13:42:11 -06:00
Andrea Righi	42cee1c2dd	Merge pull request #286 from sched-ext/rustland-low-power-mode scx_rustland: introduce low power mode	2024-05-16 08:28:32 +02:00
I Hsin Cheng	6cce01c66b	Avoid redundant substraction in rsigmoid_u64 Originally the implementation of function rsigmoid_u64 will perform substraction even when the value of "v" equals to the value of "max" , in which the result is certainly zero. We can avoid this redundant substration by changing the condition from ">" to ">=" since we know when the value of "v" and "max" are equal we can return 0 without any substract operation.	2024-05-16 11:58:39 +08:00
David Vernet	27d2490b1e	rusty: Use scx_ops_open!() in scx_rusty Now that the scx_ops_open!() macro is available, let's use it in scx_rusty to cover all cases of when hotplug can happen. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 16:42:59 -05:00
David Vernet	34818de54d	rusty: Use built-in exit code for restarting Now that the kernel exports the SCX_ECODE_ACT_RESTART exit code, we can remove the custom hotplug logic from scx_rusty, and instead rely on the built-in logic from the kernel. There's still a corner case that we're not honoring: when a hotplug event happens on the init path. A future change will address this as well. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 16:31:56 -05:00
Andrea Righi	e9ac6105c7	scx_rustland_core: introduce low-power mode Introduce a low-power mode to force the scheduler to operate in a very non-work conserving way, causing a significant saving in terms of power consumption, while still providing a good level of responsiveness in the system. This option can be enabled in scx_rustland via the --low_power / -l option. The idea is to not immediately re-kick a CPU when it enters an idle state, but do that only if there are no other tasks running in the system. In this way, latency-critical tasks can be still dispatched immediately on the other active CPUs, while CPU-bound tasks will be forced to spend more time waiting to be scheduled, basically enforcing a special CPU throttling mechanism that affects only the tasks that are not latency critical. The consequence is a reduction in the overall system throughput, but also a significant reduction of power consumption, that can be useful for mobile / battery-powered devices. Test case (using `scx_rustland -l`): - play a video game (Terraria) while recompiling the kernel - measure game performance (fps) and core power consumption (W) - compare the result of normal mode vs low-power mode Result: Game performance \| Power consumption \| ------------+-----------------+-------------------+ normal mode \| 60 fps \| 6W \| low-power mode \| 60 fps \| 3W \| As we can see from the result the reduction of power consumption is quite significant (50%), while the responsiveness of the game (fps) remains the same, that means battery life can be potentially doubled without significantly affecting system responsiveness. The overall throughput of the system is, of course, affected in a negative way (kernel build is approximately 50% slower during this test), but the goal here is to save power while still maintaining a good level of responsiveness in the system. For this reason the low-power mode should be considered only in emergency conditions, for example when the system is close to completely run out of power or simply to extend the battery life of a mobile device without compromising its responsiveness. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-15 20:32:05 +02:00
vax-r	f293995b59	Fix typo Fix the usage of "scheduler" in the comment of main.bpf.c , it should a verb which is "schedule".	2024-05-15 23:02:35 +08:00
Changwoo Min	08e7e23cbe	scx_lavd: priint out the current limitaiton of scx_lavd for users Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-15 12:04:09 +09:00
Changwoo Min	a4560c7f7f	scx_lavd: add comments describing the idea of preemption Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-15 12:04:03 +09:00
Andrea Righi	2a7b1cc3c4	scx_rustland: properly support offline CPUs During the initialization phase the scheduler needs to be aware of all the available CPUs in the system (also those that are offline), in order to create a proper per-CPU DSQ for all of them. Otherwise, if some cores are offline, we may get errors like the following: swapper/7[0] triggered exit kind 1024: runtime error (invalid DSQ ID 0x0000000000000007) Backtrace: scx_bpf_consume+0xaa/0xd0 bpf_prog_42ff1b9d1ac5b184_rustland_dispatch+0x12b/0x187 Change the code to configure the BpfScheduler object with the total amount of CPUs available in the system and prevent such failure. This fixes #280. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-12 08:42:46 +02:00
Andrea Righi	a31bcc6847	scx_rustland: maximize CPU utilization Always dispatch at least one task, even if all the CPUs are busy. This small overcommitment allows to maximize the CPU utilization without introducing bubbles in the scheduling and also without introducing regressions in terms of resposiveness. Before this change the average CPU utilization of a `stress-ng -c 8` on an 8-cores system is around 95%. With this change applied the CPU utilization goes up to a consistent 100%. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-11 16:23:12 +02:00
Andrea Righi	63feba9c2b	topology: TopologyMap: add nr_cpus_online() Add a method to TopologyMap to get the amount of online CPUs. Considering that most of the schedulers are not handling CPU hotplugging it can be useful to expose also this metric in addition to the amount of available CPUs in the system. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 17:24:20 +02:00
Andrea Righi	f052493005	scx_rustland_core: implement effective time slice on a per-task basis Drop the global effective time-slice and use the more fine-grained per-task time-slice to implement the dynamic time-slice capability. This allows to reduce the scheduler's overhead (dropping the global time slice volatile variable shared between user-space and BPF) and it provides a more fine-grained control on the per-task time slice. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 17:24:20 +02:00
Changwoo Min	01faf9408b	Merge pull request #274 from multics69/scx-lavd-preemption02 scx_lavd: support yield-based preemption	2024-05-10 11:32:29 +09:00
Changwoo Min	446de3ef3c	scdx_lavd: minor style changes Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-10 11:07:32 +09:00
Changwoo Min	7fcc6e4576	scx_lavd: support yield-based preemption If there is a higher priority task when running ops.tick(), ops.select_cpu(), and ops.enqueue() callbacks, the current running tasks yields its CPU by shrinking time slice to zero and a higher priority task can run on the current CPU. As low-cost, fine-grained preemption becomes available, default parameters are adjusted as follows: - Raise the bar for remote CPU preemption to avoid IPIs. - Increase the maximum time slice. - Gradually enforce the fair use of CPU time (i.e., ineligible duration) Lastly, using CAS, we ensure that a remote CPU is preempted by only one CPU. This removes unnecessary remote preemptions (and IPIs). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-10 00:54:41 +09:00
Andrea Righi	7bc62d8db8	Merge pull request #270 from sched-ext/rustland-user-ringbuffer scx_rustland_core: use a BPF_MAP_TYPE_USER_RINGBUF to dispatch tasks	2024-05-09 06:50:19 +02:00
vax-r	093a08356e	Fix typo Fix "expermentation" to "experimentation".	2024-05-09 12:10:55 +08:00
Andrea Righi	5da4602ad7	scx_rustland_core: use a BPF_MAP_TYPE_USER_RINGBUF to dispatch tasks Replace the BPF_MAP_TYPE_QUEUE with a BPF_MAP_TYPE_USER_RINGBUF to store the tasks dispatched from the user-space scheduler to the BPF component. This eliminates the need of the bpf() syscalls, significantly reducing the overhead of the user-space->kernel communication and delivering a notable performance boost in the overall system throughput. Based on experimental results, this change allows to reduces the scheduling overhead by approximately 30-35% when the system is overcommitted. This improvement has the potential to make user-space schedulers based on scx_rustland_core viable options for real production systems. Link: https://github.com/libbpf/libbpf-rs/pull/776 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-08 22:16:53 +02:00
David Vernet	b9b9875aa7	rusty: Remove task offline tracking scx_rusty's intention is to support hotplug by automatically restarting whenever a hotplug event is encountered. Now that we're not trying to consume a bogus DSQ in the rusty_dispatch() on a newly hotplugged CPU, let's just remove offline tracking. It's really just there as a sanity check, but it triggers if an offline task is made runnable during a hotplug event before the ops.hotplug() callback has been invoked. Signed-off-by: David Vernet <void@manifault.com>	2024-05-04 21:33:55 -05:00
David Vernet	6f1dc6067a	rusty: Check for offline CPU in rusty_dispatch() There's currently a slight issue on existing kernels on the hotplug path wherein we can start to receive scheduling callbacks on a CPU before that CPU has received hotplug events. For CPUs going online, this can possibly confuse a scheduler because it may not be expecting anything to ever happen on that CPU, and therefore may do things that could cause the scheduler to crash. For example, without this patch in scx_rusty, we try to consume from a bogus DSQ that doesn't exist, which causes ext.c to boot out the scheduler. Though this issue will soon be fixed in ext.c, let's explicitly avoid dispatching from an onlining CPU in rusty so that we properly support hotplug on older kernels as well. Signed-off-by: David Vernet <void@manifault.com>	2024-05-04 21:33:54 -05:00
David Vernet	0d6b00238f	common: Add likely/unlikely macros We can hint to the compiler about paths we'll take in a scheduler. This is a common pattern, so lets provide convenience macros. Signed-off-by: David Vernet <void@manifault.com>	2024-05-04 21:33:53 -05:00
David Vernet	4b16f5117a	rusty: Fix alignment Found a misaligned conditional in main.rs. Fix it. Signed-off-by: David Vernet <void@manifault.com>	2024-05-04 21:33:19 -05:00
Changwoo Min	01e5a46371	Merge pull request #263 from multics69/scx_lavd-power01 scx_lavd: support CPU frequency scaling	2024-05-05 10:16:00 +09:00
Changwoo Min	a24e1d7adf	scx_lavd: more comments about CPU frequency scaling Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-04 10:41:13 +09:00
David Vernet	9bb8e9a548	common: Pull bpf_log2l() into helper function header scx_lavd implemented 32 and 64 bit versions of a base-2 logarithm function. This is now also used in rusty. To avoid code duplication, let's pull it into a shared header. Note that there is technically a functional change here as we remove the always inline compiler directive. We instead assume that the compiler will know best whether or not to inline the function. Signed-off-by: David Vernet <void@manifault.com>	2024-05-03 14:50:24 -05:00
David Vernet	2403f60631	rusty: Dynamically scale slice according to system util In user space in rusty, the tuner detects system utilization, and uses it to inform how we do load balancing, our greedy / direct cpumasks, etc. Something else we could be doing but currently aren't, is using system utilization to inform how we dispatch tasks. We currently have a static, unchanging slice length for the runtime of the program, but this is inefficient for all scenarios. Giving a task a long slice length does have advantages, such as decreasing the number of involuntary context switches, decreasing the overhead of preemption by doing it less frequently, possibly getting better cache locality due to a task running on a CPU for a longer amount of time, etc. On the other hand, long slices can be problematic as well. When a system is highly utilized, a CPU-hogging task running for too long can harm interactive tasks. When the system is under-utilized, those interactive tasks can likely find an idle, or under-utilized core to run on. When the system is over-utilized, however, they're likely to have to park in a runqueue. Thus, in order to better accommodate such scenarios, this patch implements a rudimentary slice scaling mechanism in scx_rusty. Rather than having one global, static slice length, we instead have a dynamic, global slice length that can be changed depending on system utilization. When over-utilized, we go with a longer slice length, and vice versa for when the system is under-utilized. With Terraria, this results in roughly a 50% improvement in mean FPS when playing on an AMD Ryzen 9 7950X, while running Spotify, and stress-ng -c $((4 * $(nproc))). Signed-off-by: David Vernet <void@manifault.com>	2024-05-03 14:17:58 -05:00
David Vernet	76618989f8	rusty: Implement basic eligible deadline scheduling in rusty scx_rusty doesn't do terribly well with interactive workloads. In order to improve the situation, this patch adds support for basic deadline scheduling in rusty. This approach doesn't incorporate eligibility, and simply uses a crude avg_runtime tracking approach to scaling a task's deadline. In a series of follow-on changes, we'll update the scheduler to use more indicators for interactivity that affect both slice length, and deadline calculation. Signed-off-by: David Vernet <void@manifault.com>	2024-05-03 14:17:56 -05:00
Changwoo Min	6892898469	scx_lavd: support CPU frequency scaling To know the required CPU performance (e.g., frequency) demand, we keep track of 1) utilization of each CPU and 2) _performance criticality_ of each task. The performance criticality of a task denotes how critical it is to CPU performance (frequency). Like the notion of latency criticality, we use three factors: the task's average runtime, wake-up frequency, and waken-up frequency. A task's runtime is longer, and its two frequencies are higher; the task is more performance-critical because it would be a bottleneck in the middle of the task chain. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-05-04 00:30:25 +09:00
David Vernet	925a69b156	rusty: Use helper to lookup domain context Let's remove the extraneous copy pasting and use a lookup helper like we do for task and pcpu context. Signed-off-by: David Vernet <void@manifault.com>	2024-05-02 13:56:46 -05:00
Daniel Jordan	de2773d621	scx_rusty: compare abs values in xfer_between() A LoadEntity gets the load to transfer between two entities by taking the minimum of their imbalances and reducing its abs value by xfer_ratio. In practice self.imbal(), the push node or domain, always has positive imbalance and other.imbal(), the pull node or domain, always has negative imbalance, so other.imbal() is always the minimum even though the abs value of its imbalance might be greater than the abs value of self.imbal(). It seems like the intent is to take the minimum of the two absolute values instead to avoid overbalancing at the puller, so make both values abs. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>	2024-05-02 11:54:13 -04:00
Daniel Jordan	1652791e5d	scx_rusty: make per-task loads sensitive to lb_apply_weight Rusty's load balancer calculates load differently based on average system CPU utilization in create_domain_hierarchy(). At >= 99.999% utilization, load is the product of a task's weight and duty cycle; below that, load is the same as the task's duty cycle. populate_tasks_by_load(), however, always uses the product when calculating per-task load so that in the sub-99.999% util case, load is inflated, typically by a factor of 100 with a normal priority task. Tasks look too heavy to migrate as a result because a single task would transfer more load than the domain imbalance allows, leading to significant imbalance in some cases. Make populate_tasks_by_load() calculate task load the same way as domain load, checking lb_apply_weight. Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>	2024-05-02 11:54:05 -04:00
Andrea Righi	11f100f043	scx_rustland: bump up version to 0.0.6 Bump up scx_rustland version to use the new scx_rustland_core crate. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-30 18:32:21 +02:00
Andrea Righi	fd68ce13a7	scx_rustland_core: bump up version to 0.4.0 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-30 18:09:09 +02:00
Tejun Heo	c77d101655	scheds/c: Sync to the new conventions Sync with the in-kernel-tree example schedulers.	2024-04-29 10:13:46 -10:00
Tejun Heo	71d5e60093	scheds/rust: Use __COMPAT helpers instead of open coding feature tests	2024-04-29 09:58:34 -10:00
Tejun Heo	cf66e58118	Sync from kernel (670bdab6073) And fix build breakage in scx_utils due to an enum type rename.	2024-04-29 09:58:19 -10:00
Tejun Heo	e5e88b7e18	Bump versions to prepare for a release	2024-04-29 09:07:27 -10:00
Tejun Heo	3e7ef35649	Merge pull request #250 from multics69/lavd-issue-234 scx_lavd: replesih time slice at ops.running() only when necessary	2024-04-29 09:01:04 -10:00

1 2 3 4 5 ...

409 Commits