JakeHillion/scx

mirror of https://github.com/JakeHillion/scx.git synced 2024-11-25 19:10:23 +00:00

Author	SHA1	Message	Date
Andrea Righi	d336892c71	Merge pull request #816 from sched-ext/rustland-core-update-doc scx_rustland_core: update documentation about the new API	2024-10-17 19:18:16 +00:00
Andrea Righi	a155ff2ada	scx_rustland_core: update documentation about the new API Update the documentation adding the new task statistics provided by scx_rustland_core. Fixes: `be681c7` ("scx_rustland_core: pass nvcsw, slice and dsq_vtime to user-space") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-17 19:07:51 +02:00
Jake Hillion	f1b1830512	Merge pull request #814 from JakeHillion/pr814 layered: add RandomTopo layer growth algorithm	2024-10-17 17:05:53 +00:00
Jake Hillion	1415b4a454	layered: make disable_topology arg require equals The recent changes to `disable_topology` making the arg an `Option<bool>` instead of a `bool` caused an issue with it incorrectly attaching arguments. Make the argument `require_equals` to fix this case. This is a behaviour change for anybody previously relying on `-t true`, `-t false`, `--disable-topology true`, or `--disable-topology false`. The equals syntax worked before and continues to work after, as demonstrated in the CI. Test plan: Before: ```sh $ sudo target/release/scx_layered -t f:/tmp/test.json error: invalid value 'f:/tmp/test.json' for '--disable-topology [<DISABLE_TOPOLOGY>]' [possible values: true, false] For more information, try '--help'. ``` After: ```sh $ sudo target/release/scx_layered -t f:/tmp/test.json 14:44:00 [INFO] CPUs: online/possible=176/176 nr_cores=88 14:44:00 [INFO] Disabling topology awareness ... ^CEXIT: Scheduler unregistered from user space ```	2024-10-17 15:46:30 +01:00
Jake Hillion	a0fe303b61	layered: add RandomTopo layer growth algorithm Add an additional layer growth algorithm, named 'RandomTopo'. It follows these rules: - Randomise NUMA nodes. List each core in each NUMA node before a core from another NUMA node. - Randomise LLCs within each NUMA node. List each core in each LLC before a core in a different LLC. - Randomise the core order within each LLC. This attempts to provide a relatively evenly distributed set of cores while considering topology. Unlike `Topo`, it does not require you to specify the ordering and instead generates it from the hardware, making desyncs between the config and the hardware less likely. Currently `RandomTopo` considers topology even with `--disable-topology=true`. I can see the arguments for this going both ways. On one hand requesting disable topology suggests you want no consideration of machine topology, and `RandomTopo` should decay to `Random` (which it does on single node/LLC machines anyway). On the other hand, the config explicitly specifies `RandomTopo` and should consider the topology. If anyone feels strongly I can change this to respect `disable_topology`. Test plan: ```sh $ sudo target/release/scx_layered -v f:/tmp/test.json ... 14:31:19 [DEBUG] layer: batch algo: RandomTopo core order: [47, 44, 43, 42, 40, 45, 46, 41, 38, 37, 36, 39, 34, 32, 35, 33, 54, 49, 50, 52, 51, 48, 55, 53, 68, 64, 66, 67, 70, 69, 71, 65, 9, 10, 12, 15, 14, 11, 8, 13, 59, 60, 57, 63, 62, 56, 58, 61, 2, 3, 5, 4, 0, 6, 7, 1, 86, 83, 85, 87, 84, 81, 80, 82, 20, 22, 19, 23, 21, 18, 17, 16, 30, 25, 26, 31, 28, 27, 29, 24, 78, 73, 74, 79, 75, 77, 76, 72] 14:31:19 [DEBUG] layer: immediate algo: RandomTopo core order: [45, 40, 46, 42, 47, 43, 41, 44, 80, 82, 83, 84, 85, 86, 81, 87, 13, 10, 9, 15, 14, 12, 11, 8, 36, 38, 39, 32, 34, 35, 33, 37, 7, 3, 1, 0, 2, 5, 4, 6, 53, 52, 54, 48, 50, 49, 55, 51, 76, 77, 79, 78, 73, 74, 72, 75, 71, 66, 64, 67, 70, 69, 65, 68, 24, 26, 31, 25, 28, 30, 27, 29, 58, 56, 59, 61, 57, 62, 60, 63, 16, 19, 17, 23, 22, 20, 18, 21] ... ``` This is a machine with 1 NUMA/11 LLCs with 8 cores per LLC and you can see the results are grouped by LLC but random within.	2024-10-17 15:36:00 +01:00
Daniel Hodges	b01ff79080	Merge pull request #805 from hodgesds/layered-refresh-cleanup scx_layered: Refactor refresh cpumasks	2024-10-16 19:06:15 +00:00
Andrea Righi	2ea47af4bc	Merge pull request #804 from sched-ext/rustland-fixes scx_rustland fixes and improvements	2024-10-16 18:26:03 +00:00
Tejun Heo	84d8abf913	Revert "Use per-arch vmlinux.h" This reverts commit `a23f3566e3`.	2024-10-16 06:42:28 -10:00
Tejun Heo	bd79059f1a	Revert "Add vmlinux.h for multiple arch" This reverts commit `7067092555`.	2024-10-16 06:42:18 -10:00
Dan Schatzberg	730052a0c4	Merge pull request #803 from dschatzberg/mitosis_fallback_dsq scx_mitosis: Handle pinned tasks	2024-10-16 13:26:23 +00:00
Andrea Righi	763da6ab55	scx_rlfifo: operate in a more work-conserving way Make scx_rlfifo even simpler and keep dispatching tasks even if the CPUs are all busy. This allows to better stress test the scx_rustland_core backend, by using both the per-CPU DSQs and the global shared DSQ. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	b07de1d7d5	scx_rustland: clarify EDF scheduling scx_rustland is now effectively a deadline-based scheduler and not a pure vruntime-based scheduler. Clarify this in the source code. No functional change. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	c4b6408e92	scx_rustland: smooth vruntime update Update vruntime adding the used virtual time slice of each task as soon they are scheduled. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	0b2de2c10c	scx_rustland: use built-in nvcsw metrics Use the nvcsw metric from the scx_rustland_core backend, intead of retrieving this metric in user-space via procfs. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	97629178e2	scx_rustland_core: bump up version to 2.2.2 Bump up the minor version to reflect the new backward-compatible functionality added. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Daniel Hodges	907746745e	scx_layered: Refactor refresh cpumasks Refactor the logic for refresh cpumasks to be easy to read and verify. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-15 17:58:10 -07:00
Tejun Heo	4841df8138	Merge pull request #793 from minosfuture/vmlinux_per_arch Use per-arch vmlinux.h	2024-10-15 19:52:42 +00:00
Dan Schatzberg	96ebe6b84a	scx_mitosis: Handle pinned tasks Pinned tasks should just be routed to a fallback DSQ. kthreads are given a higher priority than non-kthreads so use two fallback DSQs. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-15 09:09:01 -07:00
Dan Schatzberg	902f41adf0	Merge pull request #799 from dschatzberg/mitosis_dispatch_no_wakeup scx_mitosis: handle enqueue() on !wakeup	2024-10-15 13:46:07 +00:00
Daniel Hodges	71d63010af	scx_layered: Refactor layer iteration Remove DSQ iter algos. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 13:13:53 -07:00
Dan Schatzberg	a17f16e4b9	scx_mitosis: handle enqueue() on !wakeup If we're not on the wakeup path, we may see enqueue() invoked without select_cpu() which will require an idle cpu lookup. In order to fix this, we refactor the idle_cpu lookup in select_cpu so it can be invoked from enqueue(). Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-14 10:13:07 -07:00
Daniel Hodges	912d6e01c1	scx_layered: Add LLC integration test Add an integration test for testing that the `llcs` field on the layer config works properly. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 07:27:29 -07:00
Daniel Hodges	ed18e43612	Merge pull request #795 from hodgesds/bpftrace-tests scx_layered: Add topology integration test	2024-10-14 12:54:54 +00:00
Daniel Hodges	e456c83536	scx_layered: Add topology integration test Add a bpftrace script that does a topology aware test. The test script runs a bpftrace script that asserts that stress-ng processes are scheduled on NUMA node 0 only. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-13 20:23:11 -07:00
Ming Yang	f7cdf08754	scx_mitosis: Fix static assertion of scx_bpf_task_cgroup failing __weak check it failed the static assertion in macro bpf_ksym_exists. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Ming Yang	7067092555	Add vmlinux.h for multiple arch Following the change of using per-arch vmlinux.h. Add it for the remaining archs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Ming Yang	a23f3566e3	Use per-arch vmlinux.h vmlinux.h is not compatible across archs. Handle this compatibility issue by * Add arch info into vmlinux.h real file name * Link vmlinux.h to the target-arch real file at build time * Use target-arch real file for scx_utils bindgen. Also refactored clang related logic into a new clang_info mod, which is shared by bpf_builder.rs and builder.rs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Changwoo Min	c1f4051a14	scx_lavd: fix int overflow in calculating avg_lat_cri u32 is not big enough to hold the sum of lat_cri in a period, so sum_lat_cri (u32) was overflown, resulting in incorrect avg_lat_cri. Change the type from u32 to u64, avoiding the interger overflow. Note that {sum/avg}_lat_cri is only for deubugging so it is irrelevant in making scheduling decisions. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:58:36 +09:00
Changwoo Min	6c9bbe66dc	scx_lavd: remove unnecessary downscaling in deadline calculation The downscaling is not necessary in calculating task's virtual deadline because virtual dealine represents only relative order in task scheduling. Hence downscaling incurs only inacuracy caused by truncation. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:41:23 +09:00
Changwoo Min	6ddc3f0a2b	scx_lavd: do not inspect scx_lavd process itself Print the task status of scx_lavd is not useful, so filter it out. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-12 17:21:08 +09:00
Andrea Righi	197dee93f4	scx_bpfland: get rid of per-CPU DSQs Using per-CPU DSQs seems to introduce more issues than benefits (potential stalls, etc.). Therefore, let's get rid of the per-CPU DSQs and use SCX_DSQ_LOCAL for tasks directly dispatched to specific CPUs. This change seems to also improve performance on 6.12 and it makes the scheduler a lot more stable and consistent. The issues will be investigated separately, providing a separate stress test scheduler, designed to stress test per-CPU DSQs. Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:15:51 +02:00
Andrea Righi	198f22656c	scx_bpfland: clarify error code returned by pick_idle_cpu() Return more meaningful error codes from pick_idle_cpu(). No functional change, just improved code readability. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	ceb4f1755f	scx_bpfland: always refill task timeslice in ops.dispatch() When a task exhausts its timeslice and no other tasks are ready to run, we automatically refill its timeslice, but only if the current CPU is a fully idle SMT core. If we don’t handle the refill, the sched_ext core will default to refilling using SCX_SLICE_DFL, which may not be optimal. To ensure better control over the task’s timeslice, always refill it when no other tasks are available to run. Fixes: `6e24fcc` ("scx_bpfland: keep tasks running on full-idle SMT cores") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	54d704ceda	scx_bpfland: pick a random idle CPU when prev_cpu is not valid Pick any random idle CPU when the previous CPU isn't valid anymore according to the task's cpumask. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Changwoo Min	836cf9faa4	Merge pull request #779 from multics69/lavd-futex-v2 scx_lavd: mitigate the lock holder preemption problem	2024-10-12 02:42:33 +00:00
Daniel Hodges	a08a76ccd6	scx_layered: Cleanup non topology path More cleanup in the non topology path to remove copy/pasta declarations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-11 10:18:34 -07:00
Jake Hillion	eb59085e61	Merge pull request #781 from JakeHillion/pr781 layered: move configuration into library component	2024-10-11 16:39:23 +00:00
Jake Hillion	52c279a469	layered: make default value for disable_topology dynamic Disable topology currently defaults to `false` (topology enabled...). Change this so that topology is enabled by default on hardware that may benefit from it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit from it. This is a slightly noisy change as we have to move ownership of the newly mutable layer specs into the `Scheduler` object (previously they were a borrow). We don't have a `Topology` object to make the default decision from until `Scheduler::init`, and I think this is because of the possibility of hot plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is potentially mutable. Test plan: - CI. Updated to be explicit about topology in both cases. Single NUMA multi-LLC machine: ``` $ scx_layered --run-example ... 13:34:01 [INFO] Topology awareness not specified, selecting enabled based on hardware ... $ scx_layered --run-example --disable-topology=true ... 13:33:41 [INFO] Disabling topology awareness ... $ scx_layered --run-example -t ... 13:33:15 [INFO] Disabling topology awareness ... $ scx_layered --run-example --disable-topology=false # none of the above messages present ``` Single NUMA single LLC machine: ``` $ scx_layered --run-example 15:33:10 [INFO] Topology awareness not specified, selecting disabled based on hardware ```	2024-10-11 17:09:07 +01:00
Jake Hillion	143a55cda1	layered: move configuration into library component Move the LayerConfig and its children from `main.rs` into `lib.rs`. This allows other tooling, such as config managers or test executors, to modify layered configs programmatically. The end goal is to move everything in `layered` except for the argument parsing into a `run_layered` function, but I haven't done it in this diff because it's a larger change. This is a common pattern in Rust projects to do as little as possible in `main.rs` for extensibility. The only change here, other than publicity and where things are located, is the signature of `CpuPool::alloc_cpus`. It previously relied on `&Layer`, and this changes it to the two elements of `Layer` it uses. This allows `Layer` to stay confined to `main.rs` (for now) to prevent scope creep in this PR. This may be inconvenient in the short term for WIPs and anyone doing non-Cargo builds (cough me), but having things split into more files should make rebases/merges easier in the long run. Test plan: - `cargo build --release` - CI.	2024-10-11 15:55:29 +01:00
Changwoo Min	648c95be9e	scx_lavd: fix incorrect task comparison for preemption Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 21:53:24 +09:00
likewhatevs	b88f567e25	Merge pull request #782 from likewhatevs/lsp-nice-util layered -- make lsp work nice on util include file	2024-10-11 12:30:19 +00:00
Pat Somaru	2b309dbbb4	make lsp work nice on util include	2024-10-11 08:06:29 -04:00
Pat Somaru	7627e1cc42	scx_layered: fix lsp etc on util.bpf.c	2024-10-11 08:02:23 -04:00
Changwoo Min	5b4b255cbb	scx_lavd: do not preempt while holding a lock When a task holds a lock, it should not yield its time slice or it should not be preempted out. In this way, we can mitigate harmful preemption of lock holders and reduce the total preemption counts. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:49:09 +09:00
Changwoo Min	bd17589a6e	scx_lavd: boost latency criticality when a task holds a lock When a lock holder exhausts its time slide, it will be re-enqueued to a DSQ waiting for shceduling while holding a lock. In this case, prioritize its latency criticality proportionally, so a lock holder would be not stuck in a DSQ for a long time, improving system-wide progress. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:48:56 +09:00
Changwoo Min	77b8e65571	scx_lavd: tracing all blocking locks and futexes Trace the acquisition and release of blocking locks for kernel and fuxtexes for user-space. This is necessary to boost a lock holder task in terms of latency and time slice. We do not boost shared lock holders (e.g., read lock in rw_semaphore) since the kernel already prioritizes the readers over writers. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 17:03:48 +09:00
Ryan Wilson	8c8250b1e2	[layered] Implement reverse weight DSQ algorithm	2024-10-10 12:53:25 -07:00
Daniel Hodges	9f60053312	Merge pull request #775 from hodgesds/layered-idle-cleanup scx_layered: Cleanup topology preempt path	2024-10-10 18:34:08 +00:00
Daniel Hodges	fb4dcf91eb	scx_layered: Change default DSQ iter algo Change the default DSQ iter algo from round robin to linear. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-10 11:10:27 -07:00
Daniel Hodges	b22e83d4d5	scx_layered: Cleanup topology preempt path Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-10 09:56:42 -07:00
Andrea Righi	d62989e462	scx_bpfland: fix cpumask initialization error In the WAKE_SYNC path lf L3 cache awareness is disabled (--disable-l3) we may hit the following error: Error: EXIT: scx_bpf_error (CPU L3 cpumask not initialized) Fix this by setting the L3 cpumask to the whole primary domain if L3 cache awareness is disabled. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-10 09:30:54 +02:00
Daniel Hodges	fe00e2c7be	scx_layered: Refactor topo preemption Refactor topology preemption logic so the non topology aware code is contianed to a separate function. This should make maintaining the non topology aware code path far easier. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 21:24:07 -04:00
Daniel Hodges	451c68b44e	scx_layered: Cleanup debug messages Cleanup debug messages to use a common prefix when the scheduler is initialized. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 19:06:28 -04:00
Daniel Hodges	81a5250d49	scx_layered: Fix verifier errors Fix verifier errors when using different DSQ iteration algorithms and cleanup some code. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 14:36:12 -07:00
Dan Schatzberg	12cf482487	Merge pull request #767 from dschatzberg/mitosis-build mitosis: Fix build	2024-10-09 19:32:35 +00:00
Dan Schatzberg	c794c389da	mitosis: apply autoformatting Apply clang-format autoformatting on the c code and cargo fmt on the rust code. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-09 10:56:27 -07:00
Jake Hillion	483a565d7f	Merge pull request #759 from JakeHillion/pr759 layered: attempt to work steal from own llc before others	2024-10-09 17:42:23 +00:00
Daniel Hodges	678c205572	Merge pull request #766 from hodgesds/layered-load-fixes scx_layered: Rename load_adj statistic	2024-10-09 17:12:24 +00:00
Jake Hillion	d9dc46b5d2	layered: attempt to work steal from own llc before others	2024-10-09 17:39:06 +01:00
Dan Schatzberg	347147b10d	mitosis: fix build Minimal changes to make sure scx_mitosis can build with the latest scx changes. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-09 08:30:15 -07:00
Daniel Hodges	30258cff1b	scx_layered: Update docs for layer_preempt_weight_disable Update docs for layer_preempt_weight_disable and layer_growth_weight_disable. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 06:37:54 -07:00
Daniel Hodges	edc673460d	scx_layered: Rename load_adj statistic Rename the `load_adj` statistic to `load_frac_adj`, which is a more accurate representation of what the statistic is calculating. The statistic is a fractional representation of the load of a layer adjusted for infeasible weights. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 06:23:37 -07:00
Jake Hillion	c23efb1ed3	Merge pull request #749 from JakeHillion/pr749 layered: split dispatch into no_topo version	2024-10-09 13:15:12 +00:00
Jake Hillion	19d09c3cc1	layered: split dispatch into no_topo version Refactor layered_dispatch into two functions: layered_dispatch_no_topo and layered_dispatch. layered_dispatch will delegate to layered_dispatch_no_topo in the disable_topology case. Although this code doesn't run when loaded by BPF due to the global constant bool blocking it, it makes the functions really hard to parse as a human. As they diverge more and more it makes sense to split them into separate manageable functions. This is basically a mechanical change. I duplicated the existing function, replaced all `disable_topology` with true in `no_topo` and false in the existing function, then removed all branches which can't be hit. Test plan: - Runs on my dev box (6.9.0 fbkernel) with `scx_layered --run-example -n`. - As above with `-t`. - CI.	2024-10-09 13:33:06 +01:00
Daniel Hodges	2b5829e275	Merge pull request #763 from ryantimwilson/rusty-default-weights-fix [rusty] Fix load stats when host is under-utilized	2024-10-09 12:14:51 +00:00
likewhatevs	29bb3110ec	Merge pull request #765 from likewhatevs/update-dispatch scx_layered: enable configuring layer iteration when no topo	2024-10-09 06:22:40 +00:00
Pat Somaru	8e2f195af1	enable configuring layer iteration when no topo enable configuring layer iteration order in dispatch when topology is disabled. replace some member_vptr's in that iteration with regular accesses	2024-10-09 01:53:19 -04:00
Andrea Righi	e3e381dc8e	Merge pull request #755 from sched-ext/bpfland-prevent-kthread-stall scx_bpfland: prevent per-CPU DSQ stall with per-CPU kthreads	2024-10-09 05:28:59 +00:00
Ryan Wilson	fbdb6664ec	[rusty] Fix load stats when host is under-utilized	2024-10-08 21:08:07 -07:00
Pat Somaru	c90144d761	Revert "Merge pull request #746 from likewhatevs/layered-delay" This reverts commit `2077b9a799`, reversing changes made to `eb73005d07`.	2024-10-08 22:01:05 -04:00
Daniel Hodges	e6773d43b1	scx_layered: Make stress-ng non exclusive in example Test CI hosts are VMs currently and making stress-ng exclusive may starve the host. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-08 10:49:51 -07:00
Daniel Hodges	66f967c06d	Merge pull request #756 from hodgesds/layered-example-stress scx_layered: Add stress-ng example layer	2024-10-08 15:31:44 +00:00
likewhatevs	e1f6c792fe	Merge pull request #757 from JakeHillion/pr757 layered: cleanup warnings in bpf compilation	2024-10-08 15:29:12 +00:00
Jake Hillion	85daa2be32	layered: cleanup warnings in bpf compilation clang is correctly warning that we use various uninitialised variables. clean these up so real errors are easier to read. The largest change here is to non-topological layered_dispatch. The matching_dsq logic seems to be incorrect. It checks whether an uninitialised variable is 0, if it is sets it, then only uses the variable if the value is 0. I have changed this to default to -1, then use the value if it is no longer -1.	2024-10-08 16:25:43 +01:00
Daniel Hodges	f3191afca7	scx_layered: Add stress-ng example layer Add a stress-ng example layer, which will be used for CI testing with stress-ng. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-08 07:56:54 -07:00
Andrea Righi	c8a9207371	scx_bpfland: prevent per-CPU DSQ stall with per-CPU kthreads Since per-CPU kthreads may show an inconsistent prev_cpu and/or cpumask, dispatch them directly to local DSQ and allow to preempt the current running task. This allows to prevent per-CPU kthread stalls and it also helps to prioritize them, as are usually important for system performance and responsiveness. Moreover, change the behavior of --local-kthreads to prioritize all kthreads when this option is used. This addresses issue #728. NOTE: ideally we may want to fix this in the kernel by making sure to always expose a consistent prev_cpu and cpumask also for kthreads, but at the moment this change allows to prevent some annoying stalls and performance-wise it doesn't seem to introduce any regression. In fact, the usual gaming/fps benchmarks show even a slight improvement in responsiveness with this change applied. Thanks to YUBY from the CachyOS community for all the extremely valuable help with the intensive stress tests. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-08 15:02:31 +02:00
Daniel Hodges	d7576d4b44	Merge pull request #754 from minosfuture/cpu_pool_doc scx_layered: Add doc comment to CpuPool	2024-10-08 12:22:55 +00:00
likewhatevs	2077b9a799	Merge pull request #746 from likewhatevs/layered-delay scx_layered: lighten/reduce nested loops in layered dispatch	2024-10-08 11:32:55 +00:00
Ming Yang	0dbb8c2374	scx_layered: Add doc comment to CpuPool Add doc comment to `CpuPool` as a quick reference for each member. Most importantly, differentiate "cpu" and "core", as logical core and physical core, respectively. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-07 21:48:46 -07:00
Pat Somaru	51d9e90d39	formatting	2024-10-07 18:54:30 -04:00
Pat Somaru	d2ac627942	formatting	2024-10-07 18:47:27 -04:00
Pat Somaru	3369836970	formatting	2024-10-07 18:44:44 -04:00
Pat Somaru	e0ce4711d4	flatten and simplify dispatch	2024-10-07 18:36:07 -04:00
Daniel Hodges	eb73005d07	Merge pull request #747 from hodgesds/layered-idle-order scx_layered: Update idle topology selection order	2024-10-07 20:01:38 +00:00
Ryan Wilson	a76778a4ab	scx_rusty: Fix BPF crash during CPU hotplug When hotplugging CPUs in rapid succession, scx_rusty would crash with: ``` scx_bpf_error (Failed to lookup dom[4294967295] ``` The root cause is if the scheduler is restarted fast enough, a task on a previously hotplugged CPU may not have moved off that CPU yet. Thus, the CPU -> domain map would contain an invalid domain (u32::max) and we would fail to lookup the domain correctly in rusty_select_cpu for prev_cpu. To fix this, if the CPU is offline, we do not try to allocate to the same NUMA node (assuming hotplug is a rare operation) beyond domestic domain. Instead we use greedy allocation - first idle, then busy - then any CPU.	2024-10-07 11:59:36 -07:00
Daniel Hodges	0b497d6df0	scx_layered: Update idle topology selection order Update the idle topology selection order, the current logic is: core architecture (big/little) -> LLC -> NUMA -> Machine It's probably better to try to keep cache lines clean and do: LLC -> core architecture (big/little) -> NUMA -> Machine Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 10:34:11 -07:00
Daniel Hodges	024a2aa658	scx_layered: Improve perf on non topo aware paths Improve the performance on non topology aware paths by skipping some map lookups and uneccessary initializations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 07:56:18 -07:00
Daniel Hodges	24fba4ab8d	scx_layered: Add idle smt layer configuration Add support for layer configuration for idle CPU selection. This allows layers to choose whether or not to restrict idle CPU selection to SMT idle CPUs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 06:58:54 -07:00
Daniel Hodges	2f280ac025	scx_layered: Use idle smt mask for idle selection In the non topology aware code the idle smt mask is used for finding idle cpus. Update topology aware idle selection to also use the idle smt mask. In certain benchmarks this can improve performance. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 05:40:59 -07:00
Daniel Hodges	30feecc5ae	Merge pull request #743 from hodgesds/layered-big-little-mask scx_layered: Add big cpumask	2024-10-07 11:05:01 +00:00
Daniel Hodges	d86638ef0b	scx_layered: Add big cpumask Add big cpumask to scx_layered and prefer selecting big idle cores when using the BigLittle growth algo. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-06 14:05:12 -04:00
Andrea Righi	9a29547e5b	scx_bpfland: rework lowlatency mode In lowlatency mode (option --lowlatency) tasks are ordered using a deadline that is evaluated as the vruntime minus a certain "bonus", determined in function of the max time slice and the average amount of voluntary context switches, to amplify the priority boost of the tasks that are voluntarily releasing the CPU (which are typically interactive). However, this method can be extremely unfair in some cases: tasks with short bursts of voluntary context switches may receive a huge priority boost, making the rest of the system almost unresponsive (see massive hackbench stress tests for example). To prevent this rework the task's deadline logic to use the vruntime and a "deadline component" that is a function of the average used time slice, scaled using a dynamic task priority (evaluated as the static task priority and the its average amount of voluntary context switches). This logic seems to prevent excessive prioritization of tasks performing short intensive bursts of voluntary context switches. It also makes lowlatency mode in scx_bpfland (somehow) more similar to the deadline logic used by scx_rusty. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-05 17:44:09 +02:00
Changwoo Min	a673dcf809	Merge pull request #736 from multics69/scx-futex-v1 scx_lavd: split main.bpf.c into multiple files	2024-10-05 13:11:15 +09:00
Pat Somaru	efabcfcdc3	Replace PID with Task Pointer in Rusty Replace PID with Task Pointer in Rusty Fixes: #610	2024-10-04 18:06:37 -04:00
Daniel Hodges	c56e60b86a	scx_layered: Add better debug output of iter algo Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 11:36:36 -07:00
Daniel Hodges	e1241d6e52	scx_layered: Cleanup layer growth weight limits Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 11:16:58 -07:00
Daniel Hodges	17f9b3f4f3	scx_layered: Cleanup layer infeasible weight calc Cleanup the calculation of the infeasible weight to not use an unneccesary collect. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 10:12:22 -07:00
Daniel Hodges	0476a10f83	scx_layered: Cleanup from code review Cleanup from code review. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 10:09:38 -07:00
Daniel Hodges	817e310a31	scx_layered: Add default dsq iter algo Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:58:26 -07:00
Daniel Hodges	7ee12091c3	scx_layered: Add DSQ iteration algo Add DSQ iteration algorithms. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:58:23 -07:00
Daniel Hodges	6929501aea	scx_layered: Refactor stats variable names Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	f066580612	scx_layered: Use dcycle for infeasible weights Fix a bug to use duty cycle for infeasible weights calculations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	c55d34c319	scx_layered: Cleanup unused metrics Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	c0c4e183f0	scx_layered: Cargo fmt Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	f3b3d4f19c	scx_layered: Add weighted layer DSQ iteration Add a flag to control DSQ iteration across layers by layer weight. This helps prevent starvation by iterating over layers with the lowest weight first. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	bd75ac8dbf	scx_layered: Add flags for growth and preemption Add two new flags `layer_preempt_weight_disable` and `layer_growth_weight_disable` to disabled preemption and layer growth when weighted layer load exceeds the configured threshold. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	e48e675cff	scx_layered: Remove LoadLedger from stats Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	2518c99bf2	scx_layered: Refactor load calculation Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	54dbf35680	scx_layered: Add weights to userspace layer config Add weights to userspace layer config. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	07be9dcf59	scx_layered: Add stats for adjusted layer weights Add stats for infeasible weights adjusted layer stats. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	da38d69009	scx_layered: Add layer weights Add weights to layers and use the infeasible weights crate to properly apply weights during contention to prevent starvation. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Ming Yang	d76036b7cb	scx_layered: Add Reverse layer growth algo Add `LayerGrowthAlgo::Reverse` to be the reverse order of Linear. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-04 09:29:36 -07:00
Ming Yang	35d5c082d5	scx_layered: Break up layer_core_order function `layer_core_order` provided multiple core growth implementation Break it up into smaller function. Also, attach the method to LayerGrowthAlgo. And `LayerCoreOrderGenerator` is added to make future growth algo extension easy. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-04 09:18:01 -07:00
Ming Yang	29308d4705	scx_layered: Move layer core growth logic to separate module Move layer core growth logic to separate module for further refactoring. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-04 08:51:11 -07:00
Changwoo Min	7c5c83a3a2	scx_lavd: split main.bpf.c into multiple files As the main.bpf.c file grows, it gets hard to maintain. So, split it into multiple logical files. There is no functional change. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-05 00:25:40 +09:00
Ming Yang	658a75df73	scx_layered: Add per layer time slices to stats Quoting issue #720 description from @hodgesds: > In `scx_layered` the time slice can be [configured per layer](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/main.rs#L493). > This should be added to the > [`LayerStats`](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/stats.rs#L51) > for each layer. During stats > [refresh](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/main.rs#L852) > read the time slice duration (from the bpf skel) to the layer and add it > to the stats. Finally, update the > [format](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/stats.rs#L218) > method for `LayerStats` to print the per layer time slices. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-03 20:56:03 -07:00
Fredrik Lönnegren	4b290a1757	scx_rusty: fix single dom short-circuit Remove a short-circuit in cpu_to_dom_id that will return domain id 0 for any input. This fixes a crash of scx_rusty when running with a single domain and any CPU is offline. Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>	2024-10-03 20:34:18 +02:00
Changwoo Min	b1070449b2	Merge pull request #714 from multics69/lavd-hotplug scx_lavd: support CPU hotplug correctly	2024-10-03 07:35:25 +09:00
Tejun Heo	7402895f4a	version: v1.0.5	2024-10-02 08:34:57 -10:00
Daniel Hodges	054352f172	Merge pull request #716 from vax-r/lavd_typo scx_lavd: Fix typo	2024-10-02 13:21:15 +00:00
I Hsin Cheng	3055716382	scx_rusty: Delete unused function variable "struct task_struct *p" isn't used within the function "task_load_adj()". Delete the function parameter for cleaner code. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-10-02 17:49:13 +08:00
I Hsin Cheng	7fbef2aa0b	scx_lavd: Fix typo Fix "alreay" to "already". Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-10-02 17:39:10 +08:00
Changwoo Min	770a59f69d	scx_lavd: support CPU hotplug correctly Use scx_utils::NR_CPU_IDS to iterate whole CPUs and separately count the number of online CPUs to support CPU hotplug correctly. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-02 14:19:18 +09:00
Changwoo Min	fb7bc0a850	scx_lavd: fix incorrect preemtability test Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-02 13:24:43 +09:00
Ming Yang	445743487a	Add #stat_doc attribute macro to Stats struct `#stat_doc` extends the document from stat desc property. Add this attribute macro to the remaining Stats structs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-09-30 22:12:11 -07:00
Daniel Hodges	c897511c62	scx_layered: Fix compiler warnings Cleanup various compiler warnings. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-30 20:59:24 -04:00
Tejun Heo	04648bc511	Merge pull request #703 from minosfuture/main scx_stats: Implement macro #stat_doc to autogen doc from stat desc	2024-09-30 17:58:56 +00:00
Andrea Righi	e966455af2	scx_bpfland: fix task_avg_nvcsw() return type task_avg_nvcsw() was incorrectly returning a bool instead of u64, limiting the impact of the lowlatency boost. Fix it by returning the proper type (u64). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-30 14:36:32 +02:00
Andrea Righi	6e24fcc7f0	scx_bpfland: keep tasks running on full-idle SMT cores When a task is the last one running on a CPU and still wants to continue, allow it to run and replenish its time only if the used CPU is part a fully idle SMT core. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-30 14:36:32 +02:00
Andrea Righi	c20a19c946	scx_bpfland: always give tasks a chance to run on an idle CPU During ttwu, the kernel may decide to skip ->select_task_rq() (e.g., when only one CPU is allowed or migration is disabled). This causes to call ops.enqueue() directly without having a chance to call ops.select_cpu(). Therefore, introduce a new flag (select_cpu_done) in the local task context to determine if ops.select_cpu() was bypassed and, in that case, attempt to find an idle CPU directly from ops.enqueue(). In the future this information will be supplied by the kernel through a special enqueue flag (SCX_ENQ_CPU_SELECTED) [1]. However, the custom flag in the local task context ensures to reliably determine the same information, even on older kernels where this flag is not available. [1] https://lore.kernel.org/lkml/20240928003840.GA2717@maniforge/T Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-30 14:36:19 +02:00
Daniel Hodges	bb560088de	scx_layered: Fix cache initialization cpumask Fix a bug in cache initialization where the first node would repeated get all CPUs added to the mask. Refactor some consts to be more clear. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-29 22:10:08 -04:00
Changwoo Min	ade6931bfc	scx_lavd: fix incorrect neighbor_bit initialization Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-29 02:27:40 +09:00
Changwoo Min	6d116208c8	scx_lavd: do not perform the victim selection for an invalid cpu When finding a victim candidate for preemption, a randomly chosen candidate could be out of valid CPU range due to CPU offline, etc. In this case, try another CPU randomly. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-29 02:22:18 +09:00
Ming Yang	28bfd2986a	scx_stats: Implement #stat_doc to autogen doc from stat desc The doc of scx_layered `Opt` is out of sync. Implement attribute macro #stat_doc to generate doc from the `desc` property. Apply #stat_doc to `LayerStats` and `SysStats in scx_layered. Signed-off-by : Ming Yang <minos.future@gmail.com>	2024-09-28 09:32:48 -07:00
Changwoo Min	e8ebc09ced	Merge pull request #702 from multics69/lavd-dyn-pc-thr scx_lavd: more accurately determine the performance criticality threshold	2024-09-28 04:35:45 +00:00
Changwoo Min	cd7846f4d2	scx_lavd: more accurately determine the performance criticality threshold We used the average performance criticality of tasks as a threshold to determine the proper core type (big or little). However, if the big core's compute capacity is not half of the total compute capacity, such an average-based determination becomes suboptimal. If fewer tasks are classified as performance-critical tasks and requested to run on big cores, the big cores would be wasted by stealing arbitrary non-performance-critical tasks. That could result in performance instability. Hence, determine the threshold more accurately by considering (active) big cores' compute capacity and the (approximated) distribution of performance criticality of tasks. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-27 16:56:30 +09:00
Changwoo Min	f07023e42b	scx_lavd: rename avg_perf_cri to thr_perf_cri As a preparation to improve the performance criticality logic, we first rename "avg_perf_cri" to "thr_perf_cri" since average is no longer the threshold. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-27 13:12:20 +09:00
Daniel Hodges	326ccb3385	Merge pull request #696 from hodgesds/layered-growth-fixes scx_layered: Fix idle core selection	2024-09-26 11:59:19 -04:00
Daniel Hodges	d1b425d1fa	scx_layered: Fix idle core selection Fix idle core selection to correctly use pick_idle_cpu_from. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-26 07:46:26 -07:00
Tejun Heo	540576ac30	scheds/c: Re-enable scx_flatcg and scx_pair cgroup support is availale again, re-enable scx_flatcg and scx_pair.	2024-09-25 12:38:45 -10:00
Tejun Heo	466b7984c2	Sync from kernel - a748db0c8c6a ("tools/sched_ext: Receive misc updates from SCX repo") Sync from sched_ext/for-6.12-fixes a748db0c8c6a ("tools/sched_ext: Receive misc updates from SCX repo") git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.12-fixes - cgroup support + scx_flatcg updates. - scx_bpf_dispatch[_vtime]_from_dsq() + scx_qmap updates. - COMPAT macros.	2024-09-25 12:34:03 -10:00
Daniel Hodges	f9b39244cc	Merge pull request #687 from hodgesds/layered-growth-enum-refactor scx_layered: Add layer growth algo to layer bpf config	2024-09-25 15:10:00 -04:00
Daniel Hodges	bce840d9e5	scx_layered: Add layer growth algo to layer bpf config Add an enum for the layer growth algo to the bpf layer config. This will be useful for implementing topology aware layer growth algorithms. When selecting an idle CPU the current logic tries to keep tasks local to LLC/NUMA node. However, for certain growth algorithms (ex: RoundRobin) this is suboptimal. Adding the layer growth algorithm will allow for different paths for CPU selection in the idle/preemption paths. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-25 12:00:24 -07:00
likewhatevs	99d1179866	enable ide's etc. to work on bpf.c files (#668 ) * enable ide's etc. to work on the bpf.c files this makes it so that clangd and ide tools which use clangd can work on the bpf.c code. nothing should actually be changed outside of that ide/editor environment, all the changes are ifdef'ed on LSP which is set in the added .clangd file. * move intf include out of both sides of ifdef toggle	2024-09-24 16:55:02 -04:00
Daniel Hodges	2805bb77a5	Merge pull request #683 from hodgesds/layered-idle-topo scx_layered: Make layered idle CPU selection topology aware	2024-09-24 16:37:23 -04:00
Daniel Hodges	679dd5920c	scx_layered: Fix comment Fix comment to be more accurate. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 12:31:03 -07:00
Daniel Hodges	87c6e276d9	scx_layered: Restrict idle selection to layer cpus Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 09:40:38 -07:00
Daniel Hodges	e68fccd26c	scx_layered: Update comments on layer preemption Update comments on layer preemption to be more descriptive. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 08:24:04 -07:00
Daniel Hodges	6b966cda0c	scx_layered: Restrict preemption to layer cpumask When preempting restrict preemption to the current layer cpumask. This may reduce the amount of preemption, but cause better cache locality of preempted tasks. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 06:23:01 -07:00
I Hsin Cheng	61cb3f7fc5	scx_common_bpf: Append cast_mask() Remove cast_mask() function distributed throughout different schedulers and add it in common.bpf.h so every scheduler can reference it once they need to. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-24 16:01:19 +08:00
Changwoo Min	b1bc4033b4	Merge pull request #673 from multics69/lavd-prop-lat-cri scx_lavd: propagate waker's latency criticality to its wakee	2024-09-24 07:34:07 +09:00
Daniel Hodges	29fb647c93	scx_layered: Refactor idle core selection Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-23 12:01:42 -07:00
Daniel Hodges	380fd1f3b3	scx_layered: Make idle select topology aware Make idle CPU selection topology aware. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-23 10:10:43 -07:00
Daniel Hodges	35477970bd	scx_layered: Cleanup dump format Cleanup the dump format for topology aware dumps in scx_layered. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-23 10:02:49 -07:00
Daniel Hodges	8b14e48994	Merge pull request #671 from hodgesds/layered-last-waker scx_layered: Add waker stats per layer	2024-09-23 10:58:54 -04:00
Changwoo Min	71fa92cf1c	scx_lavd: propagate waker's latency criticality to its wakee If a waker is more latency critical than a wakee, inherit a waker's latency criticality for the wakee. This allows the wakee to consider the context of who wakes me up. For now, we limit such inheritance to one hop and one schedule. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-23 12:56:16 +09:00
Changwoo Min	ad8536b4a4	Merge pull request #670 from multics69/lavd-opt-preemption scx_lavd: find a victim cpu for preemption within task's compute domain	2024-09-23 10:22:08 +09:00
Daniel Hodges	91d32663bd	scx_layered: Refactor waker tracking to only use last waker Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 18:05:54 -04:00
Daniel Hodges	1a2f82b91c	Merge pull request #666 from hodgesds/layered-local-llc scx_layered: Add topology aware preemption	2024-09-22 17:36:32 -04:00
Daniel Hodges	326f3b7988	Merge pull request #667 from hodgesds/layered-pcore-grow scx_layered: Add Big/Little core growth algos	2024-09-22 16:59:42 -04:00
Daniel Hodges	1ac9712d2e	scx_layered: Refactor preemption into a separate function Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:54:11 -04:00
Daniel Hodges	bc34bd867b	scx_layered: Add option to enable XNUMA preemption Disable XNUMA preemption by default and add an option to enable it. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:57 -04:00
Daniel Hodges	e105d9f8b1	scx_layered: Use cast_mask helper Use the cast_mask helper to clean up some of the bpf cpumask conversion code for preemption. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:57 -04:00
Daniel Hodges	5d9d32b65c	scx_layered: Add stats for XLLC/XNUMA preemptions Add stats for XLLC/XNUMA preemptions. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:57 -04:00
Daniel Hodges	c15ecbb3a4	scx_layered: Add topology aware preemption Add topology aware preemption that begins in the local LLC and attempts to preempt from cpus nearest in the topology. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:56 -04:00
Daniel Hodges	6fb2f0b2b4	scx_layered: Clean up waker code Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 06:43:10 -04:00
Daniel Hodges	c55b2c6e69	scx_layered: Add waker stats per layered Update the task context to keep a mask of wakers and add stats for wakes across layers. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 06:43:03 -04:00
Daniel Hodges	140a101874	Merge pull request #449 from hodgesds/layered-dsq-fixes scx_layered: Add a hi fallback dsq per llc	2024-09-22 06:39:46 -04:00
Changwoo Min	13a68465bf	common: add bpf_cpumask_weight() to common.bpf.h Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 12:47:38 +09:00
Changwoo Min	7321a89724	scx_lavd: find a victim cpu for preemption within task's compute domain Previously, we found a victim from the entire CPUs, which include remote or non-compatible CPUs. Now we limit our search for victim finding within a task's compute domain. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 12:47:18 +09:00
Changwoo Min	a13082c2b8	Merge pull request #669 from multics69/lavd-opt-select-cpu scx_lavd: consider waker's CPU when ops.select_cpu()	2024-09-22 09:16:06 +09:00
Andrea Righi	897977bbc1	Merge pull request #663 from vax-r/bpfland_fix scx_bpfland: Remove the usage of cast_mask in bpfland_enqueue	2024-09-21 22:15:11 +02:00
Changwoo Min	8d8d8f9f61	scx_lavd: consider waker's CPU when ops.select_cpu() In case of sync wake-up, consider waker's CPU also to improve cache locality. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 01:57:49 +09:00
Daniel Hodges	4aa841de0a	scx_layered: Rename HI_FALLBACK_DSQ to HI_FALLBACK_DSQ_BASE Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-20 17:28:38 -04:00
Daniel Hodges	a3d1344293	scx_layered: Add core growth algo for core type Add core growth algos for Big/Little core support. The algos allow layers to grow layers by preferring either big or little cores first. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-20 11:50:15 -04:00
I Hsin Cheng	7799b94f07	scx_layered: Add helper function to access cpumask within bpf_cpumask Before passing "nodec->cpumas" and "cachec->cpumask" into "bpf_cpumask_test_cpu()", type conversion should be done first. Implement "cast_mask()" to convert "struct bpf_cpumask " into "const struct cpumask ". Reference from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/cpumask_common.h#n63 Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-20 20:52:03 +08:00
I Hsin Cheng	5596d5e3fe	scx_bpfland: Remove the usage of cast_mask in bpfland_enqueue The usage of cast_mask() within bpfland_enqueue aims to cast the type of "p->cpus_ptr" from "struct bpf_cpumask " to "const struct cpumask ". However, the type of "p->cpus_ptr" is already "const cpumask_t " aka "const struct cpumask ", so no conversion is needed. Passing a value of type "struct cpumask " into "struct bpf_cpumask " also leads to compiling error. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-20 20:45:09 +08:00
Daniel Hodges	8532ba3f1e	scx_layered: Fix hi fallback dsq consumption Fix hi fallback dsq consumption to only consume from the cache local hi fallback dsq. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-20 04:18:05 -04:00
I Hsin Cheng	e4bb99efc5	scx_layered: Refactor match_layer() Refactor match_layer() to prevent the compiling error caused by uninitialization of the variable "nr_match_ors" before usage. Move the checking of "nr_match_ors" after it access the value within "layer->nr_match_ors" to make sure it's initiailized successfully. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-19 22:20:03 +08:00
Andrea Righi	3f8db5783b	Merge pull request #658 from sched-ext/rustland-core-improve-cpu-selection scx_rustland_core: improve idle CPU selection API and logic	2024-09-17 22:38:15 +02:00
Andrea Righi	e6b624a97c	scx_rustland_core: improve idle CPU selection API and logic Pass enqueue flags to user-space: flags will be passed via QueuedTask.flags and can be forwarded back to BPF via DispatchedTask.flags. These flags can be also passed to BpfScheduler.select_cpu() to apply a more refined CPU selection policy. Moreover, avoid to prioritize the user-space scheduler too much and dispatch it only if there are no other tasks that needs to be dispatched in ops.dispatch(). This improves CPU utilization and enhances the fairness, robustness, and resilience of schedulers based on scx_rustland_core, particularly under stress test conditions. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-16 22:12:38 +02:00
Daniel Hodges	4f98de333d	Merge pull request #652 from JakeHillion/layer-growth-rr scx_layered: add round robin growth strategy	2024-09-16 17:34:48 +02:00
Andrea Righi	00eebaf905	scx_bpfland: refine task wakeup logic On WAKE_SYNC attempt to migrate the wakee on the same CPU as the waker if the waker is not exiting, the wakee can use the waker's CPU, the waker's L3 domain is not saturated and there are not other tasks queued to the local DSQ of the waker's CPU. This is the same logic used in scx_rusty. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-15 14:50:14 +02:00
Andrea Righi	079a53c689	scx_bpfland: get rid of preferred domain Using the turbo boosted CPUs as preferred scheduling seems to be beneficial only a very few corner cases, for example on battery-powered devices with an aggressive cpufreq governor that constantly tries to scale down the frequency (and even in this case it's probably better to not force the tasks to run on the fast CPUs, to save power). In practive the preferred domain seems to introduce more overhead than benefits overall, so let's get rid of it. This can be improved in the future adding multiple user-configurable scheduling domains. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-15 14:50:14 +02:00
Changwoo Min	95e2f4dabe	scx_lavd: boost the latency critility of kernel threads Many kernel threads performs latency critical tasks (e.g., net, gpu). In particular, AMD GPU driver runs the most part in the kernel space using kworker. Hence, treat kernel threads as if a woken up task. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-14 00:41:02 +09:00
Changwoo Min	4b4f42fce1	scx_lavd: add a short circuit for the case of no turbo core Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-13 16:02:07 +09:00
Jake Hillion	3848d87895	scx_layered: add round robin growth strategy	2024-09-12 23:27:21 +01:00
Daniel Hodges	632fcfe4ae	Merge pull request #648 from hodgesds/layered-llc-stats scx_layered: Add stats for XNUMA/XLLC migrations	2024-09-12 13:23:23 -04:00
Daniel Hodges	dde6e0c7f9	scx_utils: Add node/llc id to core topology Add ids for node/llc in the Core topology struct.	2024-09-12 10:05:02 -07:00
Daniel Hodges	aee19dd9a1	scx_layered: Add topology aware core growth selection Add topology aware core growth selection. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-12 06:48:51 -07:00
Daniel Hodges	14a19dc3ca	scx_layered: Add random layer growth algo Add a random layer growth algo. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-12 05:35:54 -07:00
Daniel Hodges	ae57f8d1f9	scx_rusty: Initialize node cpumask Initialize the node cpumask, which was previously uninitialized causing metric calculations to be wrong when attempting to lookup CPUs in the node cpumask. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-11 13:14:44 -07:00
Jake Hillion	8ca45cfa37	lint: enable cargo fmt (#643 ) Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it. Test plan: - CI lint check passes.	2024-09-11 10:03:20 +01:00
Daniel Hodges	43ec8bfe82	scx_layered: Add stats for XNUMA/XLLC migrations Add stats for XNUMA/XLLC migrations. An example of the output is shown: ``` hodgesd : util/frac= 5.4/ 0.1 load/frac= 301.0/ 0.3 tasks= 476 tot= 3168 local=97.82 wake/exp/reenq= 2.18/ 0.00/ 0.00 keep/max/busy= 0.03/ 0.00/ 0.03 kick= 0.00 yield/ign= 0.09/ 0 open_idle= 0.00 mig= 6.82 xnuma_mig= 6.82 xllc_mig= 4.86 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 2 [ 2, 4] 00000000 00000010 00001000 normal : util/frac= 28.7/ 0.7 load/frac= 101704.7/ 95.8 tasks= 2450 tot= 4660 local=99.06 wake/exp/reenq= 0.88/ 0.06/ 0.00 keep/max/busy= 1.03/ 0.00/ 0.00 kick= 0.06 yield/ign= 0.04/ 400 open_idle=15.73 mig=23.45 xnuma_mig=23.45 xllc_mig= 3.07 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.88 min_exec= 0.00/ 0.00ms cpus= 2 [ 2, 2] 00000001 00000100 00000000 excl_coll=12.55 excl_preempt= 0.00 random : util/frac= 0.0/ 0.0 load/frac= 0.0/ 0.0 tasks= 0 tot= 0 local= 0.00 wake/exp/reenq= 0.00/ 0.00/ 0.00 keep/max/busy= 0.00/ 0.00/ 0.00 kick= 0.00 yield/ign= 0.00/ 0 open_idle= 0.00 mig= 0.00 xnuma_mig= 0.00 xllc_mig= 0.00 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 0 [ 0, 0] 00000000 00000000 00000000 excl_coll= 0.00 excl_preempt= 0.00 stress-ng: util/frac= 4189.1/ 99.2 load/frac= 4200.0/ 4.0 tasks= 43 tot= 62 local= 0.00 wake/exp/reenq= 0.00/100.0/ 0.00 keep/max/busy=2433.9/177.4/ 0.00 kick=100.0 yield/ign= 3.23/ 0 open_idle= 0.00 mig=54.84 xnuma_mig=54.84 xllc_mig=35.48 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 4 [ 4, 4] 00000300 00030000 00000000 excl_coll= 0.00 excl_preempt= 0.00 ``` Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-10 19:53:28 -07:00
Tejun Heo	8f0cc89ee8	Merge pull request #645 from frelon/rusty-init-dom scx_rusty: init domains when calculating averages	2024-09-10 12:25:51 -10:00
Andrea Righi	e6e3579a92	Merge pull request #634 from anh0516/main scx_bpfland: Documentation consistency fix	2024-09-10 23:25:55 +02:00
Fredrik Lönnegren	f155966b77	scx_rusty: init domains when calculating averages The domains are added to the aggregator when load is added (and duty_cycle is not 0.0f64). This commit makes sure that all domains are added to the aggregator even when the calculated duty_cycle is 0. Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>	2024-09-10 21:51:41 +02:00
likewhatevs	85863d0e1c	Merge pull request #644 from hodgesds/layered-topo-order scx_layered: Pass layer spec for core growth algo	2024-09-10 14:49:37 -04:00
Daniel Hodges	5fdd257862	scx_layered: Pass layer spec for core growth algo Pass in the layer spec when determining the layer core growth algo. This should make it easier to implement layer growth algos that are spec specific. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-10 10:27:08 -07:00
Samuel Nair	c6af1aa1c8	scx_layered: Fix typo in stats	2024-09-10 08:44:57 -07:00

... 2 3 4 5 6 ...

1302 Commits