scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-24 11:50:23 +00:00

Author	SHA1	Message	Date
Tejun Heo	77eec19792	Merge pull request #929 from sched-ext/htejun/layered-updates scx_layered: Perf improvements and a bug fix	2024-11-18 17:41:40 +00:00
Andrea Righi	5b4b6df5e4	Merge branch 'main' into scx-fair	2024-11-18 07:42:09 +01:00
Tejun Heo	56e0dae81d	scx_layered: Fix linter disagreement	2024-11-17 06:03:30 -10:00
Tejun Heo	93a0bc9969	scx_layered: Fix consume_preempting() when --local-llc-iteration consume_preempting() wasn't teting layer->preempt in consume_preempting() when --local-llc-iterations ending up treating all layers as preempting layers and often leading to HI fallback starvations under saturation. Fix it.	2024-11-17 05:54:03 -10:00
Tejun Heo	51d4945d69	scx_layered: Don't call scx_bpf_cpuperf_set() unnecessarily layered_running() is calling scx_bpf_cpuperf_set() whenever a task of a layer w/ cpuperf setting starts running which can be every task switch. There's no reason to repeatedly call with the same value. Remember the last value and call iff the new value is different. This reduces the bpftop reported CPU consumption of scx_bpf_cpuperf_set() from ~1.2% to ~0.7% while running rd-hashd at full CPU saturation on Ryzen 3900x.	2024-11-16 05:45:44 -10:00
Andrea Righi	678b10133d	scheds: introduce scx_flash Introduce scx_flash (Fair Latency-Aware ScHeduler), a scheduler that focuses on ensuring fairness among tasks and performance predictability. This scheduler is introduced as a replacement of the "lowlatency" mode in scx_bpfland, that has been dropped in commit `78101e4` ("scx_bpfland: drop lowlatency mode and the priority DSQ"). scx_flash operates based on an EDF (Earliest Deadline First) policy, where each task is assigned a latency weight. This weight is adjusted dynamically, influenced by the task's static weight and how often it releases the CPU before its full assigned time slice is used: tasks that release the CPU early receive a higher latency weight, granting them a higher priority over tasks that fully use their time slice. The combination of dynamic latency weights and EDF scheduling ensures responsive and stable performance, even in overcommitted systems, making the scheduler particularly well-suited for latency-sensitive workloads, such as multimedia or real-time audio processing. Tested-by: Peter Jung <ptr1337@cachyos.org> Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-16 14:49:25 +01:00
Tejun Heo	75dd81e3e6	scx_layered: Improve topology aware select_cpu() - Cache llc and node masked cpumasks instead of calculating them each time. They're recalculated only when the task has migrated cross the matching boundary and recalculation is necessary. - llc and node masks should be taken from the wakee's previous CPU not the waker's CPU. - idle_smtmask is already considered by scx_bpf_pick_idle_cpu(). No need to and it manually. - big_cpumask code updated to be simpler. This should also be converted to use cached cpumask. big_cpumask portion is not tested. This brings down CPU utilization of select_cpu() from ~2.7% to ~1.7% while running rd-hashd at saturation on Ryzen 3900x.	2024-11-15 16:29:47 -10:00
Tejun Heo	2b52d172d4	scx_layered: Encapsulate per-task layered cpumask caching and fix build warnings while at it. Maybe we should drop const from cast_mask().	2024-11-15 14:30:03 -10:00
Tejun Heo	1293ae21fc	scx_layered: Stat output format update Rearrange things a bit so that lines are not too long.	2024-11-15 13:38:56 -10:00
Jake Hillion	d35d5271f5	layered: split out common parts of LayerKind We duplicate the definition of most fields in every layer kind. This makes reading the config harder than it needs to be, and turns every simple read of a common field into a `match` statement that is largely redundant. Utilise `#[serde(flatten)]` to embed a common struct into each of the LayerKind variants. Rather than matching on the type this can be directly accessed with `.kind.common()` and `.kind.common_mut()`. Alternatively, you can extend existing matches to match out the common parts as demonstrated in this diff where necessary. There is some further code cleanup that can be done in the changed read sites, but I wanted to make it clear that this change doesn't change behaviour, so tried to make these changes in the least obtrusive way. Drive-by: fix the formatting of the lazy_static section in main.rs by using `lazy_static::lazy_static`. Test plan: ``` # main $ cargo build --release && target/release/scx_layered --example /tmp/test_old.json # this change $ cargo build --release && target/release/scx_layered --example /tmp/test_new.json $ diff /tmp/test_{old,new}.json # no diff ```	2024-11-15 21:57:22 +00:00
Daniel Hodges	1afb7d5835	scx_layered: Fix formatting Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-15 08:54:05 -08:00
Daniel Hodges	3a3a7d71ad	Merge branch 'main' into layered-dispatch-local	2024-11-14 16:10:12 -05:00
Daniel Hodges	4fc0509178	scx_layered: Add flag to control llc iteration on dispatch Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-13 12:43:45 -08:00
Daniel Hodges	0096c0632b	scx_layered: Fix cost accounting for dsqs Fix cost accounting for fallback DSQs on refresh that DSQ budgets get refilled appropriately. Add helper functions for converting to and from a DSQ id to a LLC budget id. During preemption a layer should check if it is attempting to preempt from a layer that has more budget and only preempt if the preempting layer has more budget. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-13 07:23:53 -08:00
Daniel Hodges	72f21dba06	Merge pull request #922 from hodgesds/layered-cost-dump-fixes scx_layered: Fix dump format	2024-11-12 18:28:31 +00:00
Daniel Hodges	f7009f7960	scx_layered: Fix dump format Fix a small bug where incorrect per CPU costs were being dumped. The output format should now appropriately match the per-CPU costs. The following dump shows the correct format: HI_FALLBACK[1024] nr_queued=46 -25755ms HI_FALLBACK[1025] nr_queued=43 -25947ms LO_FALLBACK nr_queued=0 -0ms COST GLOBAL[0][random] budget=16791955959896739 capacity=16791955959896739 COST GLOBAL[1][hodgesd] budget=16791955959896739 capacity=16791955959896739 COST GLOBAL[2][stress-ng] budget=43243243243243243 capacity=43243243243243243 COST GLOBAL[3][normal] budget=33583911919793478 capacity=33583911919793478 COST FALLBACK[1024][0] budget=16791955959896739 capacity=16791955959896739 COST FALLBACK[1025][1] budget=16791955959896739 capacity=16791955959896739 COST CPU[0][0][random] budget=5405405405405405 capacity=5405405405405405 COST CPU[0][1][hodgesd] budget=2702702694605435 capacity=2702702702702702 COST CPU[0][2][stress-ng] budget=540514231324919 capacity=540540540540540 COST CPU[0][3][normal] budget=5405405342325615 capacity=5405405405405405 COST CPU[0]FALLBACK[0][1024] budget=0 capacity=5405405405405405 COST CPU[0]FALLBACK[1][1025] budget=1 capacity=2702702694605435 COST CPU[1][0][random] budget=5405405405405405 capacity=5405405405405405 COST CPU[1][1][hodgesd] budget=2702702675501951 capacity=2702702702702702 COST CPU[1][2][stress-ng] budget=540514250569731 capacity=540540540540540 Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-12 10:22:17 -08:00
Daniel Hodges	ff15f257be	scx_layered: Fix formatting Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-12 10:09:22 -08:00
Daniel Hodges	673316827b	Merge pull request #918 from hodgesds/layered-slice-helper scx_layered: Add helper for layer slice duration	2024-11-11 18:24:43 +00:00
Daniel Hodges	775d09ae1f	scx_layered: Consume from local LLCs for dispatch When dispatching consume from DSQs in the local LLC first before trying remote DSQs. This should still be fair as the layer iteration order will be maintained. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-11 09:22:03 -08:00
Daniel Hodges	b2505e74df	Merge branch 'main' into layered-consume-fix	2024-11-11 11:29:43 -05:00
Daniel Hodges	1ed387d7f3	scx_layered: Fix error in dispatch consumption Fix bug is consume_non_open where it improperly returns 0 when the DSQ is not consumed. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-11 08:19:54 -08:00
Daniel Hodges	cad3413886	scx_layered: Add helper for layer slice duration Add a helper for returning the appropriate slice duration for a layer and replace a various instances where the slice value was being recalculated. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-11 06:11:32 -08:00
Pat Somaru	89f4aa1351	scx_layered: add antistall add timer based antistall to scx_layered and new flags to enable/disable and specify seconds of delay before it turns on. also update ci config to make sure this verifies/runs.	2024-11-08 20:31:02 -05:00
Tejun Heo	bb91ad0084	scx_layered: Work around older kernels choking on function calls from sleepable progs Verifier in older kernels choke on function calls from sleepable progs triggering non-sensical RCU state error: frame1: R1_w=scalar(id=674,smin=smin32=0,smax=umax=smax32=umax32=51,var_off=(0x0; 0x3f)) R10=; return llc_ptr; 1072: (61) r0 = (u32 *)(r2 +0) ; frame1: R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=map_value(map=bpf_bpf.rodata,ks=4,vs=9570,off=4400,smin=smin32=0,smax=umax=smax32=umax32=204,var_off=(0x0; 0xfc)) refs=13,647 ; } 1073: (95) exit bpf_rcu_read_unlock is missing processed 10663 insns (limit 1000000) max_states_per_insn 8 total_states 615 peak_states 281 mark_read 20 -- END PROG LOAD LOG -- Work around by adding and using __always_inline variant of cpu_to_llc_id() from layered_init(). Note that we can't switch everyone to __always_inline as that can lead to verification failure due to ins limit.	2024-11-08 08:47:57 -10:00
Lohith C V	a2e119ae23	scx_lavd: docs: fix typos	2024-11-08 16:25:55 +05:30
Daniel Hodges	3b47782bf4	scx_layered: Add fallback costs to dump Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 19:49:09 -05:00
Daniel Hodges	73926d6481	Merge pull request #912 from hodgesds/layered-mask-cleanup scx_layered: Cleanup cpumask	2024-11-07 22:52:28 +00:00
Jake Hillion	5ae1b84533	Merge pull request #908 from JakeHillion/pr908 layered/topo: lift layer specific checks out of per-LLC loop	2024-11-07 21:48:20 +00:00
Daniel Hodges	ee4fd3dace	scx_layered: Cleanup cpumask Cleanup remaining cpumasks to use `cast_mask`. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 13:18:10 -08:00
Daniel Hodges	637fc3f6e1	scx_layered: Use layer idle_smt option When selecting the idle CPU use the idle_smt option on the layer. This may improve cache locality in some cases by placing tasks on CPUs that are on closer cache lines. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 13:05:19 -08:00
Daniel Hodges	7db2ef22d0	scx_layered: Fix verifier issue on older kernels On some older kernels layered fails to validate. Prevent certain helpers from being inlined to pass the verifier. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 12:20:58 -08:00
Jake Hillion	ba54808150	layered/topo: lift layer specific checks out of per-LLC loop The loops in topology aware mode were recently refactored to place the -per-LLC loops inside the per-layer loops. However, the layer specific checks were left in the inner loops, slowing this down unnecessarily. Pull the layer specific checks from the inner loop into the outer loop. Also changes these functions to `__weak` to ensure they don't get inlined - they're expected to be verified as global functions. Note to reviewers: this looks good to me, but I'd appreciate if you reviewed the De Morgan applications in detail. Test plan: - `cargo build --release && sudo target/release/scx_layered --run-example` on a machine with multiple LLCs. It's possible to stall it quite easily with stress-ng but I believe this is the case on main.	2024-11-07 18:34:44 +00:00
Changwoo Min	416de68b72	Merge pull request #904 from multics69/lavd-drop-padding scx_lavd: drop padding in cpdom_cpumask, which was a workaround	2024-11-07 16:13:07 +00:00
Changwoo Min	56357a79db	Merge pull request #903 from multics69/lavd-issue-897 scx_lavd: update cur_logical_clk atomically	2024-11-07 16:12:56 +00:00
Daniel Hodges	3cc849f234	scx_layered: Fix verifier issue when tracing Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 06:43:40 -08:00
Daniel Hodges	487baa4a03	scx_layered: Add fallback DSQ cost accounting Add fallback DSQ cost accounting so that fallback DSQ costs are accounted for and so that dispatch of fallback DSQs can be done in a standardized way. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 05:25:57 -08:00
Changwoo Min	22cb9e9ce1	scx_lavd: drop padding in cpdom_cpumask, which was a workaround The verifier error seems to stem from the wrong vmlinux.h. Also, PR #889 seems to completely fix the problem. So, drop the workaround. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-07 16:13:06 +09:00
Changwoo Min	e9ba2d53fa	scx_lavd: update cur_logical_clk atomically Previously, the cur_logical_clk is updated with WIRTE_ONCE(), which does not guarantee the atomicity when concurrent writes happen -- which is possible. So change it using CAS (compare-and-swap). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-07 16:01:50 +09:00
Emil Tsalapatis	5e35a12ce3	remove stray print	2024-11-06 18:02:08 -08:00
Emil Tsalapatis	42880404e1	Merge branch 'main' of https://github.com/sched-ext/scx into core_enums	2024-11-06 12:44:23 -08:00
Emil Tsalapatis	2f174db96f	use the enum singleton in the userspace scheduler components	2024-11-06 12:17:16 -08:00
Emil Tsalapatis	1cabed9d09	Autogenerate enums and BPF enum setters for Rust schedulers	2024-11-06 12:17:16 -08:00
Emil Tsalapatis	d500c50098	add autogenerated enum definitions for Rust schedulers	2024-11-06 12:17:16 -08:00
Dan Schatzberg	fb635cb8f0	Merge pull request #438 from dschatzberg/mitosis Refactor select_cpu + enqueue for proper synchronization and handle of !wakeup	2024-11-06 18:36:05 +00:00
Andrea Righi	f402f118db	Merge pull request #899 from sched-ext/bpfland-rework scx_bpfland: rework	2024-11-06 17:55:36 +00:00
Tejun Heo	ad45727139	version: v1.0.6	2024-11-06 06:54:26 -10:00
Dan Schatzberg	af2cb1abbe	scx_mitosis: add RCU-like synchronization scx_mitosis relied on the implicit assumption that after a sched tick, all outstanding scheduling events had completed but this might not actually be correct. This feels like a natural use-case for RCU, but there is no way to directly make use of RCU in BPF. Instead, this commit implements an RCU-like synchronization mechanism. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-11-06 08:33:29 -08:00
Emil Tsalapatis	479d515a45	Merge branch 'main' into core_enums	2024-11-06 11:07:42 -05:00
Emil Tsalapatis	23f302cf13	add SCX_SLICE_* macros to scx_utils and use them for the Rust schedulers	2024-11-06 07:52:04 -08:00
Andrea Righi	78101e4688	scx_bpfland: drop lowlatency mode and the priority DSQ Schedule all tasks using a single global DSQ. This gives a better control to prevent potential starvation conditions. With this change, scx_bpfland adopts a logic similar to scx_rusty and scx_lavd, prioritizing tasks based on the frequency of their wait and wake-up events, rather than relying exclusively on the average amount of voluntary context switches. Tasks are still classified as interactive / non-interactive based on the amount of voluntary context switches, but this is only affecting the cpufreq logic. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-06 15:06:39 +01:00

1 2 3 4 5 ...

1144 Commits