scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-12-13 03:57:19 +00:00

Author	SHA1	Message	Date
Changwoo Min	8d8d8f9f61	scx_lavd: consider waker's CPU when ops.select_cpu() In case of sync wake-up, consider waker's CPU also to improve cache locality. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 01:57:49 +09:00
Daniel Hodges	a3cc4c223f	Merge pull request #664 from vax-r/layered_fix_cpumask scx_layered: Refactor match_layer() and implement helper function to access cpumask within bpf_cpumask	2024-09-20 15:20:35 +02:00
I Hsin Cheng	7799b94f07	scx_layered: Add helper function to access cpumask within bpf_cpumask Before passing "nodec->cpumas" and "cachec->cpumask" into "bpf_cpumask_test_cpu()", type conversion should be done first. Implement "cast_mask()" to convert "struct bpf_cpumask " into "const struct cpumask ". Reference from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/cpumask_common.h#n63 Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-20 20:52:03 +08:00
Andrea Righi	401c9392ed	Merge pull request #665 from vax-r/rustland_core_fix scx_rustland_core: Access the returned value of saturating_sub()	2024-09-20 07:38:43 +02:00
I Hsin Cheng	9f64db7cbc	scx_rustland_core: Access the returned value of saturating_sub() Use an "_" variable to access the returned valued of "saturating_sub()" to mute the compilation warnings. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-19 23:01:17 +08:00
I Hsin Cheng	e4bb99efc5	scx_layered: Refactor match_layer() Refactor match_layer() to prevent the compiling error caused by uninitialization of the variable "nr_match_ors" before usage. Move the checking of "nr_match_ors" after it access the value within "layer->nr_match_ors" to make sure it's initiailized successfully. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-19 22:20:03 +08:00
Andrea Righi	488f209c28	Merge pull request #662 from sched-ext/rustland-prevent-ci-failures scx_rustland_core: prevent CI failures	2024-09-19 14:37:20 +02:00
Andrea Righi	809d39aa7f	scx_rustland_core: dispatch all kthreads directly from BPF Dispatching kthreads via user-space can still lead to deadlocks in certain cases (for example we can still trigger stalls by running the fork stressor via stress-ng). To prevent such stalls simply dispatch kthreads directly from BPF for now to prevent failures. In the future we may consider to provide an API to restrict the selection of tasks directly dispatched (for example passing a mask PF_* flags to "whitelist" the tasks that are allowed to bypass the user-space scheduler). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-19 09:12:13 +02:00
Andrea Righi	e78ee41a2e	scx_rustand_core: prevent nr_queued underflow Updating nr_queued in a non-atomic when a queued task is consumed can lead to underflows. We don't really care about being 100% accurate here, since nr_queued should be considered more of a statistic than an accurate value. Therefore, just accept the fact that nr_queued can be inaccurate and handle potential underflows. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-19 09:09:24 +02:00
Andrea Righi	3f8db5783b	Merge pull request #658 from sched-ext/rustland-core-improve-cpu-selection scx_rustland_core: improve idle CPU selection API and logic	2024-09-17 22:38:15 +02:00
Andrea Righi	86db45f855	scx_rustland_core: prevent deadlock with per-CPU DSQs and CPU affinity If a task that is executing sched_setaffinity() is dispatched on a per-CPU DSQ it may stall the DSQ completely, since the task won't be able to be consumed from the corresponding CPU. This can be easily triggered running the following stress test: $ stress-ng --aggressive -c (nproc) -f (nproc) From the stall trace we can see something like the following: R stress-ng[2648662] -6880ms scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=0x5 dsq_vtime=0 cpus=ff __set_cpus_allowed_ptr+0x1c8/0x260 __sched_setaffinity+0x105/0x1c0 sched_setaffinity+0x1ed/0x2d0 __x64_sys_sched_setaffinity+0xa5/0x100 do_syscall_64+0x82/0x190 entry_SYSCALL_64_after_hwframe+0x76/0x7e This should probably be addressed in the core sched_ext, but for now prevent this deadlock by tracking when a task is executing sched_setaffinity() and automatically bounce those tasks to the shared DSQ (that can be consumed from any CPU). This should solve all the recent CI failures with the scx_rustland_core schedulers. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-17 07:42:37 +02:00
Andrea Righi	e6b624a97c	scx_rustland_core: improve idle CPU selection API and logic Pass enqueue flags to user-space: flags will be passed via QueuedTask.flags and can be forwarded back to BPF via DispatchedTask.flags. These flags can be also passed to BpfScheduler.select_cpu() to apply a more refined CPU selection policy. Moreover, avoid to prioritize the user-space scheduler too much and dispatch it only if there are no other tasks that needs to be dispatched in ops.dispatch(). This improves CPU utilization and enhances the fairness, robustness, and resilience of schedulers based on scx_rustland_core, particularly under stress test conditions. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-16 22:12:38 +02:00
Jake Hillion	23acd6ebe9	scxstats_to_openmetrics: fix format string On Python versions that perform validation of this line it fails because of a square bracket mismatch. This is due to the single quotes being parsed first. Fix by changing the outer string to double quotes.	2024-09-16 18:16:28 +01:00
Daniel Hodges	4f98de333d	Merge pull request #652 from JakeHillion/layer-growth-rr scx_layered: add round robin growth strategy	2024-09-16 17:34:48 +02:00
Andrea Righi	8656157ee4	Merge pull request #655 from sched-ext/bpfland-refine-wake-sync scx_bpfland: refine idle CPU selection logic	2024-09-15 15:51:51 +02:00
Andrea Righi	00eebaf905	scx_bpfland: refine task wakeup logic On WAKE_SYNC attempt to migrate the wakee on the same CPU as the waker if the waker is not exiting, the wakee can use the waker's CPU, the waker's L3 domain is not saturated and there are not other tasks queued to the local DSQ of the waker's CPU. This is the same logic used in scx_rusty. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-15 14:50:14 +02:00
Andrea Righi	079a53c689	scx_bpfland: get rid of preferred domain Using the turbo boosted CPUs as preferred scheduling seems to be beneficial only a very few corner cases, for example on battery-powered devices with an aggressive cpufreq governor that constantly tries to scale down the frequency (and even in this case it's probably better to not force the tasks to run on the fast CPUs, to save power). In practive the preferred domain seems to introduce more overhead than benefits overall, so let's get rid of it. This can be improved in the future adding multiple user-configurable scheduling domains. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-15 14:50:14 +02:00
Changwoo Min	4fb2b09a6e	Merge pull request #654 from multics69/main scx_lavd: boost the latency critility of kernel threads	2024-09-14 10:44:31 +09:00
Changwoo Min	95e2f4dabe	scx_lavd: boost the latency critility of kernel threads Many kernel threads performs latency critical tasks (e.g., net, gpu). In particular, AMD GPU driver runs the most part in the kernel space using kworker. Hence, treat kernel threads as if a woken up task. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-14 00:41:02 +09:00
Changwoo Min	10f0378e9d	Merge pull request #653 from multics69/lavd-opt scx_lavd: add a short circuit for the case of no turbo core	2024-09-14 00:34:26 +09:00
Changwoo Min	4b4f42fce1	scx_lavd: add a short circuit for the case of no turbo core Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-13 16:02:07 +09:00
Jake Hillion	3848d87895	scx_layered: add round robin growth strategy	2024-09-12 23:27:21 +01:00
Daniel Hodges	632fcfe4ae	Merge pull request #648 from hodgesds/layered-llc-stats scx_layered: Add stats for XNUMA/XLLC migrations	2024-09-12 13:23:23 -04:00
Daniel Hodges	ec7f75619a	Merge pull request #649 from hodgesds/layered-topo-grow scx_layered: Add topology aware core growth selection	2024-09-12 13:20:20 -04:00
Daniel Hodges	dde6e0c7f9	scx_utils: Add node/llc id to core topology Add ids for node/llc in the Core topology struct.	2024-09-12 10:05:02 -07:00
Daniel Hodges	aee19dd9a1	scx_layered: Add topology aware core growth selection Add topology aware core growth selection. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-12 06:48:51 -07:00
Daniel Hodges	be1020f517	Merge pull request #650 from frelon/update-tumbleweed-docs update Tumbleweed installation notes	2024-09-12 09:05:13 -04:00
Daniel Hodges	e9a7d5ce16	Merge pull request #651 from hodgesds/layered-random scx_layered: Add random layer growth algo	2024-09-12 08:56:45 -04:00
Daniel Hodges	14a19dc3ca	scx_layered: Add random layer growth algo Add a random layer growth algo. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-12 05:35:54 -07:00
Fredrik Lönnegren	7d7bf94bb0	update Tumbleweed installation notes Kernel package name was change to kernel-default. Also add link to documentation to README.md Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>	2024-09-12 10:28:03 +02:00
Daniel Hodges	ae57f8d1f9	scx_rusty: Initialize node cpumask Initialize the node cpumask, which was previously uninitialized causing metric calculations to be wrong when attempting to lookup CPUs in the node cpumask. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-11 13:14:44 -07:00
Jake Hillion	8ca45cfa37	lint: enable cargo fmt (#643 ) Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it. Test plan: - CI lint check passes.	2024-09-11 10:03:20 +01:00
Daniel Hodges	43ec8bfe82	scx_layered: Add stats for XNUMA/XLLC migrations Add stats for XNUMA/XLLC migrations. An example of the output is shown: ``` hodgesd : util/frac= 5.4/ 0.1 load/frac= 301.0/ 0.3 tasks= 476 tot= 3168 local=97.82 wake/exp/reenq= 2.18/ 0.00/ 0.00 keep/max/busy= 0.03/ 0.00/ 0.03 kick= 0.00 yield/ign= 0.09/ 0 open_idle= 0.00 mig= 6.82 xnuma_mig= 6.82 xllc_mig= 4.86 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 2 [ 2, 4] 00000000 00000010 00001000 normal : util/frac= 28.7/ 0.7 load/frac= 101704.7/ 95.8 tasks= 2450 tot= 4660 local=99.06 wake/exp/reenq= 0.88/ 0.06/ 0.00 keep/max/busy= 1.03/ 0.00/ 0.00 kick= 0.06 yield/ign= 0.04/ 400 open_idle=15.73 mig=23.45 xnuma_mig=23.45 xllc_mig= 3.07 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.88 min_exec= 0.00/ 0.00ms cpus= 2 [ 2, 2] 00000001 00000100 00000000 excl_coll=12.55 excl_preempt= 0.00 random : util/frac= 0.0/ 0.0 load/frac= 0.0/ 0.0 tasks= 0 tot= 0 local= 0.00 wake/exp/reenq= 0.00/ 0.00/ 0.00 keep/max/busy= 0.00/ 0.00/ 0.00 kick= 0.00 yield/ign= 0.00/ 0 open_idle= 0.00 mig= 0.00 xnuma_mig= 0.00 xllc_mig= 0.00 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 0 [ 0, 0] 00000000 00000000 00000000 excl_coll= 0.00 excl_preempt= 0.00 stress-ng: util/frac= 4189.1/ 99.2 load/frac= 4200.0/ 4.0 tasks= 43 tot= 62 local= 0.00 wake/exp/reenq= 0.00/100.0/ 0.00 keep/max/busy=2433.9/177.4/ 0.00 kick=100.0 yield/ign= 3.23/ 0 open_idle= 0.00 mig=54.84 xnuma_mig=54.84 xllc_mig=35.48 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 4 [ 4, 4] 00000300 00030000 00000000 excl_coll= 0.00 excl_preempt= 0.00 ``` Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-10 19:53:28 -07:00
Tejun Heo	8f0cc89ee8	Merge pull request #645 from frelon/rusty-init-dom scx_rusty: init domains when calculating averages	2024-09-10 12:25:51 -10:00
Andrea Righi	e6e3579a92	Merge pull request #634 from anh0516/main scx_bpfland: Documentation consistency fix	2024-09-10 23:25:55 +02:00
Fredrik Lönnegren	f155966b77	scx_rusty: init domains when calculating averages The domains are added to the aggregator when load is added (and duty_cycle is not 0.0f64). This commit makes sure that all domains are added to the aggregator even when the calculated duty_cycle is 0. Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>	2024-09-10 21:51:41 +02:00
likewhatevs	85863d0e1c	Merge pull request #644 from hodgesds/layered-topo-order scx_layered: Pass layer spec for core growth algo	2024-09-10 14:49:37 -04:00
Daniel Hodges	5fdd257862	scx_layered: Pass layer spec for core growth algo Pass in the layer spec when determining the layer core growth algo. This should make it easier to implement layer growth algos that are spec specific. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-10 10:27:08 -07:00
Daniel Hodges	8f8fe1b2c1	Merge pull request #642 from samuelnair/main scx_layered: Fix typo in stats	2024-09-10 13:20:01 -04:00
likewhatevs	ffe8ca31e5	Merge pull request #641 from likewhatevs/migrate-ci-24.04 migrate ci vm to 24.04	2024-09-10 11:46:43 -04:00
Samuel Nair	c6af1aa1c8	scx_layered: Fix typo in stats	2024-09-10 08:44:57 -07:00
patso	28319f3205	migrate ci vm to ubuntu 24.04 migrate ci vm to ubuntu 24.04	2024-09-10 09:53:40 -04:00
likewhatevs	c4c3659b6d	Merge pull request #638 from likewhatevs/remove-rlimit-dep remove dependency on rlimit.rs	2024-09-10 03:14:12 -04:00
Andrea Righi	8efe786799	Merge pull request #640 from sched-ext/bpfland-used-time-slice scx_bpfland: use sum_exec_runtime to evaluate task's used time slice	2024-09-10 09:10:52 +02:00
Andrea Righi	655ed5b4c6	scx_bpfland: use sum_exec_runtime to evaluate task's used time slice Using p->scx.slice to evaluate the consumed time slice can be a bit imprecise, because the sched_ext core implements yielding by setting p->scx.slice to 0. When the task's vruntime is evaluated this is considered as the task has exhausted its entire allocated time slice, even though it voluntarily released the CPU before the slice fully expired. To avoid this inaccuracy and prevent penalizing tasks that voluntarily release the CPU, always evaluate the used time slice based on the difference in the task's total execution time (p->se.sum_exec_runtime). This method provides a more precise calculation of vruntime and results in a fairer task's deadline evaluation. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-10 08:03:35 +02:00
patso	c1df85914b	remove dependency on rlimit.rs the rlimit crate is the only dependency crate with a build.rs. build.rs files complicate portability. this removes the need for rlimit.rs	2024-09-10 01:16:53 -04:00
Daniel Hodges	f3b7016a46	Merge pull request #633 from likewhatevs/add-pages enable docs generation and upload	2024-09-09 19:12:23 -04:00
patso	2e46f71780	generate docs for scx and kernel generate docs for scx and kernel and push to gh page this "adds" kernel scheduler and bpf docs to the generated scx rust docs.	2024-09-09 15:37:59 -04:00
Tejun Heo	249121f15f	Merge pull request #635 from sched-ext/htejun/build build: Use a single top-level rust workspace	2024-09-08 21:57:27 -10:00
Tejun Heo	b2a71b166e	build: Remove unused rust/meson.build The previous commit makes this file unused but forgot to remove it. Remove it.	2024-09-08 20:00:35 -10:00

1 2 3 4 5 ...

1709 Commits