scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-24 11:50:23 +00:00

Author	SHA1	Message	Date
Changwoo Min	6ddc3f0a2b	scx_lavd: do not inspect scx_lavd process itself Print the task status of scx_lavd is not useful, so filter it out. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-12 17:21:08 +09:00
Andrea Righi	197dee93f4	scx_bpfland: get rid of per-CPU DSQs Using per-CPU DSQs seems to introduce more issues than benefits (potential stalls, etc.). Therefore, let's get rid of the per-CPU DSQs and use SCX_DSQ_LOCAL for tasks directly dispatched to specific CPUs. This change seems to also improve performance on 6.12 and it makes the scheduler a lot more stable and consistent. The issues will be investigated separately, providing a separate stress test scheduler, designed to stress test per-CPU DSQs. Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:15:51 +02:00
Andrea Righi	198f22656c	scx_bpfland: clarify error code returned by pick_idle_cpu() Return more meaningful error codes from pick_idle_cpu(). No functional change, just improved code readability. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	ceb4f1755f	scx_bpfland: always refill task timeslice in ops.dispatch() When a task exhausts its timeslice and no other tasks are ready to run, we automatically refill its timeslice, but only if the current CPU is a fully idle SMT core. If we don’t handle the refill, the sched_ext core will default to refilling using SCX_SLICE_DFL, which may not be optimal. To ensure better control over the task’s timeslice, always refill it when no other tasks are available to run. Fixes: `6e24fcc` ("scx_bpfland: keep tasks running on full-idle SMT cores") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	54d704ceda	scx_bpfland: pick a random idle CPU when prev_cpu is not valid Pick any random idle CPU when the previous CPU isn't valid anymore according to the task's cpumask. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Changwoo Min	836cf9faa4	Merge pull request #779 from multics69/lavd-futex-v2 scx_lavd: mitigate the lock holder preemption problem	2024-10-12 02:42:33 +00:00
Daniel Hodges	a08a76ccd6	scx_layered: Cleanup non topology path More cleanup in the non topology path to remove copy/pasta declarations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-11 10:18:34 -07:00
Jake Hillion	eb59085e61	Merge pull request #781 from JakeHillion/pr781 layered: move configuration into library component	2024-10-11 16:39:23 +00:00
Jake Hillion	52c279a469	layered: make default value for disable_topology dynamic Disable topology currently defaults to `false` (topology enabled...). Change this so that topology is enabled by default on hardware that may benefit from it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit from it. This is a slightly noisy change as we have to move ownership of the newly mutable layer specs into the `Scheduler` object (previously they were a borrow). We don't have a `Topology` object to make the default decision from until `Scheduler::init`, and I think this is because of the possibility of hot plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is potentially mutable. Test plan: - CI. Updated to be explicit about topology in both cases. Single NUMA multi-LLC machine: ``` $ scx_layered --run-example ... 13:34:01 [INFO] Topology awareness not specified, selecting enabled based on hardware ... $ scx_layered --run-example --disable-topology=true ... 13:33:41 [INFO] Disabling topology awareness ... $ scx_layered --run-example -t ... 13:33:15 [INFO] Disabling topology awareness ... $ scx_layered --run-example --disable-topology=false # none of the above messages present ``` Single NUMA single LLC machine: ``` $ scx_layered --run-example 15:33:10 [INFO] Topology awareness not specified, selecting disabled based on hardware ```	2024-10-11 17:09:07 +01:00
Jake Hillion	143a55cda1	layered: move configuration into library component Move the LayerConfig and its children from `main.rs` into `lib.rs`. This allows other tooling, such as config managers or test executors, to modify layered configs programmatically. The end goal is to move everything in `layered` except for the argument parsing into a `run_layered` function, but I haven't done it in this diff because it's a larger change. This is a common pattern in Rust projects to do as little as possible in `main.rs` for extensibility. The only change here, other than publicity and where things are located, is the signature of `CpuPool::alloc_cpus`. It previously relied on `&Layer`, and this changes it to the two elements of `Layer` it uses. This allows `Layer` to stay confined to `main.rs` (for now) to prevent scope creep in this PR. This may be inconvenient in the short term for WIPs and anyone doing non-Cargo builds (cough me), but having things split into more files should make rebases/merges easier in the long run. Test plan: - `cargo build --release` - CI.	2024-10-11 15:55:29 +01:00
Changwoo Min	648c95be9e	scx_lavd: fix incorrect task comparison for preemption Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 21:53:24 +09:00
likewhatevs	b88f567e25	Merge pull request #782 from likewhatevs/lsp-nice-util layered -- make lsp work nice on util include file	2024-10-11 12:30:19 +00:00
Pat Somaru	2b309dbbb4	make lsp work nice on util include	2024-10-11 08:06:29 -04:00
Pat Somaru	7627e1cc42	scx_layered: fix lsp etc on util.bpf.c	2024-10-11 08:02:23 -04:00
Changwoo Min	5b4b255cbb	scx_lavd: do not preempt while holding a lock When a task holds a lock, it should not yield its time slice or it should not be preempted out. In this way, we can mitigate harmful preemption of lock holders and reduce the total preemption counts. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:49:09 +09:00
Changwoo Min	bd17589a6e	scx_lavd: boost latency criticality when a task holds a lock When a lock holder exhausts its time slide, it will be re-enqueued to a DSQ waiting for shceduling while holding a lock. In this case, prioritize its latency criticality proportionally, so a lock holder would be not stuck in a DSQ for a long time, improving system-wide progress. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:48:56 +09:00
Changwoo Min	77b8e65571	scx_lavd: tracing all blocking locks and futexes Trace the acquisition and release of blocking locks for kernel and fuxtexes for user-space. This is necessary to boost a lock holder task in terms of latency and time slice. We do not boost shared lock holders (e.g., read lock in rw_semaphore) since the kernel already prioritizes the readers over writers. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 17:03:48 +09:00
Ryan Wilson	8c8250b1e2	[layered] Implement reverse weight DSQ algorithm	2024-10-10 12:53:25 -07:00
Daniel Hodges	9f60053312	Merge pull request #775 from hodgesds/layered-idle-cleanup scx_layered: Cleanup topology preempt path	2024-10-10 18:34:08 +00:00
Daniel Hodges	fb4dcf91eb	scx_layered: Change default DSQ iter algo Change the default DSQ iter algo from round robin to linear. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-10 11:10:27 -07:00
Daniel Hodges	b22e83d4d5	scx_layered: Cleanup topology preempt path Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-10 09:56:42 -07:00
Andrea Righi	d62989e462	scx_bpfland: fix cpumask initialization error In the WAKE_SYNC path lf L3 cache awareness is disabled (--disable-l3) we may hit the following error: Error: EXIT: scx_bpf_error (CPU L3 cpumask not initialized) Fix this by setting the L3 cpumask to the whole primary domain if L3 cache awareness is disabled. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-10 09:30:54 +02:00
Daniel Hodges	fe00e2c7be	scx_layered: Refactor topo preemption Refactor topology preemption logic so the non topology aware code is contianed to a separate function. This should make maintaining the non topology aware code path far easier. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 21:24:07 -04:00
Daniel Hodges	451c68b44e	scx_layered: Cleanup debug messages Cleanup debug messages to use a common prefix when the scheduler is initialized. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 19:06:28 -04:00
Daniel Hodges	81a5250d49	scx_layered: Fix verifier errors Fix verifier errors when using different DSQ iteration algorithms and cleanup some code. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 14:36:12 -07:00
Dan Schatzberg	12cf482487	Merge pull request #767 from dschatzberg/mitosis-build mitosis: Fix build	2024-10-09 19:32:35 +00:00
Dan Schatzberg	c794c389da	mitosis: apply autoformatting Apply clang-format autoformatting on the c code and cargo fmt on the rust code. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-09 10:56:27 -07:00
Jake Hillion	483a565d7f	Merge pull request #759 from JakeHillion/pr759 layered: attempt to work steal from own llc before others	2024-10-09 17:42:23 +00:00
Daniel Hodges	678c205572	Merge pull request #766 from hodgesds/layered-load-fixes scx_layered: Rename load_adj statistic	2024-10-09 17:12:24 +00:00
Jake Hillion	d9dc46b5d2	layered: attempt to work steal from own llc before others	2024-10-09 17:39:06 +01:00
Dan Schatzberg	347147b10d	mitosis: fix build Minimal changes to make sure scx_mitosis can build with the latest scx changes. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-09 08:30:15 -07:00
Daniel Hodges	30258cff1b	scx_layered: Update docs for layer_preempt_weight_disable Update docs for layer_preempt_weight_disable and layer_growth_weight_disable. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 06:37:54 -07:00
Daniel Hodges	edc673460d	scx_layered: Rename load_adj statistic Rename the `load_adj` statistic to `load_frac_adj`, which is a more accurate representation of what the statistic is calculating. The statistic is a fractional representation of the load of a layer adjusted for infeasible weights. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 06:23:37 -07:00
Jake Hillion	c23efb1ed3	Merge pull request #749 from JakeHillion/pr749 layered: split dispatch into no_topo version	2024-10-09 13:15:12 +00:00
Jake Hillion	19d09c3cc1	layered: split dispatch into no_topo version Refactor layered_dispatch into two functions: layered_dispatch_no_topo and layered_dispatch. layered_dispatch will delegate to layered_dispatch_no_topo in the disable_topology case. Although this code doesn't run when loaded by BPF due to the global constant bool blocking it, it makes the functions really hard to parse as a human. As they diverge more and more it makes sense to split them into separate manageable functions. This is basically a mechanical change. I duplicated the existing function, replaced all `disable_topology` with true in `no_topo` and false in the existing function, then removed all branches which can't be hit. Test plan: - Runs on my dev box (6.9.0 fbkernel) with `scx_layered --run-example -n`. - As above with `-t`. - CI.	2024-10-09 13:33:06 +01:00
Daniel Hodges	2b5829e275	Merge pull request #763 from ryantimwilson/rusty-default-weights-fix [rusty] Fix load stats when host is under-utilized	2024-10-09 12:14:51 +00:00
likewhatevs	29bb3110ec	Merge pull request #765 from likewhatevs/update-dispatch scx_layered: enable configuring layer iteration when no topo	2024-10-09 06:22:40 +00:00
Pat Somaru	8e2f195af1	enable configuring layer iteration when no topo enable configuring layer iteration order in dispatch when topology is disabled. replace some member_vptr's in that iteration with regular accesses	2024-10-09 01:53:19 -04:00
Andrea Righi	e3e381dc8e	Merge pull request #755 from sched-ext/bpfland-prevent-kthread-stall scx_bpfland: prevent per-CPU DSQ stall with per-CPU kthreads	2024-10-09 05:28:59 +00:00
Ryan Wilson	fbdb6664ec	[rusty] Fix load stats when host is under-utilized	2024-10-08 21:08:07 -07:00
Pat Somaru	c90144d761	Revert "Merge pull request #746 from likewhatevs/layered-delay" This reverts commit `2077b9a799`, reversing changes made to `eb73005d07`.	2024-10-08 22:01:05 -04:00
Daniel Hodges	e6773d43b1	scx_layered: Make stress-ng non exclusive in example Test CI hosts are VMs currently and making stress-ng exclusive may starve the host. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-08 10:49:51 -07:00
Daniel Hodges	66f967c06d	Merge pull request #756 from hodgesds/layered-example-stress scx_layered: Add stress-ng example layer	2024-10-08 15:31:44 +00:00
likewhatevs	e1f6c792fe	Merge pull request #757 from JakeHillion/pr757 layered: cleanup warnings in bpf compilation	2024-10-08 15:29:12 +00:00
Jake Hillion	85daa2be32	layered: cleanup warnings in bpf compilation clang is correctly warning that we use various uninitialised variables. clean these up so real errors are easier to read. The largest change here is to non-topological layered_dispatch. The matching_dsq logic seems to be incorrect. It checks whether an uninitialised variable is 0, if it is sets it, then only uses the variable if the value is 0. I have changed this to default to -1, then use the value if it is no longer -1.	2024-10-08 16:25:43 +01:00
Daniel Hodges	f3191afca7	scx_layered: Add stress-ng example layer Add a stress-ng example layer, which will be used for CI testing with stress-ng. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-08 07:56:54 -07:00
Andrea Righi	c8a9207371	scx_bpfland: prevent per-CPU DSQ stall with per-CPU kthreads Since per-CPU kthreads may show an inconsistent prev_cpu and/or cpumask, dispatch them directly to local DSQ and allow to preempt the current running task. This allows to prevent per-CPU kthread stalls and it also helps to prioritize them, as are usually important for system performance and responsiveness. Moreover, change the behavior of --local-kthreads to prioritize all kthreads when this option is used. This addresses issue #728. NOTE: ideally we may want to fix this in the kernel by making sure to always expose a consistent prev_cpu and cpumask also for kthreads, but at the moment this change allows to prevent some annoying stalls and performance-wise it doesn't seem to introduce any regression. In fact, the usual gaming/fps benchmarks show even a slight improvement in responsiveness with this change applied. Thanks to YUBY from the CachyOS community for all the extremely valuable help with the intensive stress tests. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-08 15:02:31 +02:00
Daniel Hodges	d7576d4b44	Merge pull request #754 from minosfuture/cpu_pool_doc scx_layered: Add doc comment to CpuPool	2024-10-08 12:22:55 +00:00
likewhatevs	2077b9a799	Merge pull request #746 from likewhatevs/layered-delay scx_layered: lighten/reduce nested loops in layered dispatch	2024-10-08 11:32:55 +00:00
Ming Yang	0dbb8c2374	scx_layered: Add doc comment to CpuPool Add doc comment to `CpuPool` as a quick reference for each member. Most importantly, differentiate "cpu" and "core", as logical core and physical core, respectively. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-07 21:48:46 -07:00
Pat Somaru	51d9e90d39	formatting	2024-10-07 18:54:30 -04:00
Pat Somaru	d2ac627942	formatting	2024-10-07 18:47:27 -04:00
Pat Somaru	3369836970	formatting	2024-10-07 18:44:44 -04:00
Pat Somaru	e0ce4711d4	flatten and simplify dispatch	2024-10-07 18:36:07 -04:00
Daniel Hodges	eb73005d07	Merge pull request #747 from hodgesds/layered-idle-order scx_layered: Update idle topology selection order	2024-10-07 20:01:38 +00:00
Ryan Wilson	a76778a4ab	scx_rusty: Fix BPF crash during CPU hotplug When hotplugging CPUs in rapid succession, scx_rusty would crash with: ``` scx_bpf_error (Failed to lookup dom[4294967295] ``` The root cause is if the scheduler is restarted fast enough, a task on a previously hotplugged CPU may not have moved off that CPU yet. Thus, the CPU -> domain map would contain an invalid domain (u32::max) and we would fail to lookup the domain correctly in rusty_select_cpu for prev_cpu. To fix this, if the CPU is offline, we do not try to allocate to the same NUMA node (assuming hotplug is a rare operation) beyond domestic domain. Instead we use greedy allocation - first idle, then busy - then any CPU.	2024-10-07 11:59:36 -07:00
Daniel Hodges	0b497d6df0	scx_layered: Update idle topology selection order Update the idle topology selection order, the current logic is: core architecture (big/little) -> LLC -> NUMA -> Machine It's probably better to try to keep cache lines clean and do: LLC -> core architecture (big/little) -> NUMA -> Machine Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 10:34:11 -07:00
Daniel Hodges	024a2aa658	scx_layered: Improve perf on non topo aware paths Improve the performance on non topology aware paths by skipping some map lookups and uneccessary initializations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 07:56:18 -07:00
Daniel Hodges	24fba4ab8d	scx_layered: Add idle smt layer configuration Add support for layer configuration for idle CPU selection. This allows layers to choose whether or not to restrict idle CPU selection to SMT idle CPUs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 06:58:54 -07:00
Daniel Hodges	2f280ac025	scx_layered: Use idle smt mask for idle selection In the non topology aware code the idle smt mask is used for finding idle cpus. Update topology aware idle selection to also use the idle smt mask. In certain benchmarks this can improve performance. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 05:40:59 -07:00
Daniel Hodges	30feecc5ae	Merge pull request #743 from hodgesds/layered-big-little-mask scx_layered: Add big cpumask	2024-10-07 11:05:01 +00:00
Daniel Hodges	d86638ef0b	scx_layered: Add big cpumask Add big cpumask to scx_layered and prefer selecting big idle cores when using the BigLittle growth algo. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-06 14:05:12 -04:00
Andrea Righi	9a29547e5b	scx_bpfland: rework lowlatency mode In lowlatency mode (option --lowlatency) tasks are ordered using a deadline that is evaluated as the vruntime minus a certain "bonus", determined in function of the max time slice and the average amount of voluntary context switches, to amplify the priority boost of the tasks that are voluntarily releasing the CPU (which are typically interactive). However, this method can be extremely unfair in some cases: tasks with short bursts of voluntary context switches may receive a huge priority boost, making the rest of the system almost unresponsive (see massive hackbench stress tests for example). To prevent this rework the task's deadline logic to use the vruntime and a "deadline component" that is a function of the average used time slice, scaled using a dynamic task priority (evaluated as the static task priority and the its average amount of voluntary context switches). This logic seems to prevent excessive prioritization of tasks performing short intensive bursts of voluntary context switches. It also makes lowlatency mode in scx_bpfland (somehow) more similar to the deadline logic used by scx_rusty. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-05 17:44:09 +02:00
Changwoo Min	a673dcf809	Merge pull request #736 from multics69/scx-futex-v1 scx_lavd: split main.bpf.c into multiple files	2024-10-05 13:11:15 +09:00
Pat Somaru	efabcfcdc3	Replace PID with Task Pointer in Rusty Replace PID with Task Pointer in Rusty Fixes: #610	2024-10-04 18:06:37 -04:00
Daniel Hodges	c56e60b86a	scx_layered: Add better debug output of iter algo Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 11:36:36 -07:00
Daniel Hodges	e1241d6e52	scx_layered: Cleanup layer growth weight limits Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 11:16:58 -07:00
Daniel Hodges	17f9b3f4f3	scx_layered: Cleanup layer infeasible weight calc Cleanup the calculation of the infeasible weight to not use an unneccesary collect. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 10:12:22 -07:00
Daniel Hodges	0476a10f83	scx_layered: Cleanup from code review Cleanup from code review. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 10:09:38 -07:00
Daniel Hodges	817e310a31	scx_layered: Add default dsq iter algo Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:58:26 -07:00
Daniel Hodges	7ee12091c3	scx_layered: Add DSQ iteration algo Add DSQ iteration algorithms. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:58:23 -07:00
Daniel Hodges	6929501aea	scx_layered: Refactor stats variable names Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	f066580612	scx_layered: Use dcycle for infeasible weights Fix a bug to use duty cycle for infeasible weights calculations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	c55d34c319	scx_layered: Cleanup unused metrics Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	c0c4e183f0	scx_layered: Cargo fmt Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	f3b3d4f19c	scx_layered: Add weighted layer DSQ iteration Add a flag to control DSQ iteration across layers by layer weight. This helps prevent starvation by iterating over layers with the lowest weight first. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	bd75ac8dbf	scx_layered: Add flags for growth and preemption Add two new flags `layer_preempt_weight_disable` and `layer_growth_weight_disable` to disabled preemption and layer growth when weighted layer load exceeds the configured threshold. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	e48e675cff	scx_layered: Remove LoadLedger from stats Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	2518c99bf2	scx_layered: Refactor load calculation Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	54dbf35680	scx_layered: Add weights to userspace layer config Add weights to userspace layer config. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	07be9dcf59	scx_layered: Add stats for adjusted layer weights Add stats for infeasible weights adjusted layer stats. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	da38d69009	scx_layered: Add layer weights Add weights to layers and use the infeasible weights crate to properly apply weights during contention to prevent starvation. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Ming Yang	d76036b7cb	scx_layered: Add Reverse layer growth algo Add `LayerGrowthAlgo::Reverse` to be the reverse order of Linear. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-04 09:29:36 -07:00
Ming Yang	35d5c082d5	scx_layered: Break up layer_core_order function `layer_core_order` provided multiple core growth implementation Break it up into smaller function. Also, attach the method to LayerGrowthAlgo. And `LayerCoreOrderGenerator` is added to make future growth algo extension easy. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-04 09:18:01 -07:00
Ming Yang	29308d4705	scx_layered: Move layer core growth logic to separate module Move layer core growth logic to separate module for further refactoring. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-04 08:51:11 -07:00
Changwoo Min	7c5c83a3a2	scx_lavd: split main.bpf.c into multiple files As the main.bpf.c file grows, it gets hard to maintain. So, split it into multiple logical files. There is no functional change. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-05 00:25:40 +09:00
Ming Yang	658a75df73	scx_layered: Add per layer time slices to stats Quoting issue #720 description from @hodgesds: > In `scx_layered` the time slice can be [configured per layer](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/main.rs#L493). > This should be added to the > [`LayerStats`](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/stats.rs#L51) > for each layer. During stats > [refresh](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/main.rs#L852) > read the time slice duration (from the bpf skel) to the layer and add it > to the stats. Finally, update the > [format](https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/stats.rs#L218) > method for `LayerStats` to print the per layer time slices. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-03 20:56:03 -07:00
Fredrik Lönnegren	4b290a1757	scx_rusty: fix single dom short-circuit Remove a short-circuit in cpu_to_dom_id that will return domain id 0 for any input. This fixes a crash of scx_rusty when running with a single domain and any CPU is offline. Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>	2024-10-03 20:34:18 +02:00
Changwoo Min	b1070449b2	Merge pull request #714 from multics69/lavd-hotplug scx_lavd: support CPU hotplug correctly	2024-10-03 07:35:25 +09:00
Tejun Heo	7402895f4a	version: v1.0.5	2024-10-02 08:34:57 -10:00
Daniel Hodges	054352f172	Merge pull request #716 from vax-r/lavd_typo scx_lavd: Fix typo	2024-10-02 13:21:15 +00:00
I Hsin Cheng	3055716382	scx_rusty: Delete unused function variable "struct task_struct *p" isn't used within the function "task_load_adj()". Delete the function parameter for cleaner code. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-10-02 17:49:13 +08:00
I Hsin Cheng	7fbef2aa0b	scx_lavd: Fix typo Fix "alreay" to "already". Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-10-02 17:39:10 +08:00
Changwoo Min	770a59f69d	scx_lavd: support CPU hotplug correctly Use scx_utils::NR_CPU_IDS to iterate whole CPUs and separately count the number of online CPUs to support CPU hotplug correctly. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-02 14:19:18 +09:00
Changwoo Min	fb7bc0a850	scx_lavd: fix incorrect preemtability test Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-02 13:24:43 +09:00
Ming Yang	445743487a	Add #stat_doc attribute macro to Stats struct `#stat_doc` extends the document from stat desc property. Add this attribute macro to the remaining Stats structs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-09-30 22:12:11 -07:00
Daniel Hodges	c897511c62	scx_layered: Fix compiler warnings Cleanup various compiler warnings. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-30 20:59:24 -04:00
Tejun Heo	04648bc511	Merge pull request #703 from minosfuture/main scx_stats: Implement macro #stat_doc to autogen doc from stat desc	2024-09-30 17:58:56 +00:00
Andrea Righi	e966455af2	scx_bpfland: fix task_avg_nvcsw() return type task_avg_nvcsw() was incorrectly returning a bool instead of u64, limiting the impact of the lowlatency boost. Fix it by returning the proper type (u64). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-30 14:36:32 +02:00
Andrea Righi	6e24fcc7f0	scx_bpfland: keep tasks running on full-idle SMT cores When a task is the last one running on a CPU and still wants to continue, allow it to run and replenish its time only if the used CPU is part a fully idle SMT core. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-30 14:36:32 +02:00
Andrea Righi	c20a19c946	scx_bpfland: always give tasks a chance to run on an idle CPU During ttwu, the kernel may decide to skip ->select_task_rq() (e.g., when only one CPU is allowed or migration is disabled). This causes to call ops.enqueue() directly without having a chance to call ops.select_cpu(). Therefore, introduce a new flag (select_cpu_done) in the local task context to determine if ops.select_cpu() was bypassed and, in that case, attempt to find an idle CPU directly from ops.enqueue(). In the future this information will be supplied by the kernel through a special enqueue flag (SCX_ENQ_CPU_SELECTED) [1]. However, the custom flag in the local task context ensures to reliably determine the same information, even on older kernels where this flag is not available. [1] https://lore.kernel.org/lkml/20240928003840.GA2717@maniforge/T Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-30 14:36:19 +02:00
Daniel Hodges	bb560088de	scx_layered: Fix cache initialization cpumask Fix a bug in cache initialization where the first node would repeated get all CPUs added to the mask. Refactor some consts to be more clear. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-29 22:10:08 -04:00
Changwoo Min	ade6931bfc	scx_lavd: fix incorrect neighbor_bit initialization Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-29 02:27:40 +09:00
Changwoo Min	6d116208c8	scx_lavd: do not perform the victim selection for an invalid cpu When finding a victim candidate for preemption, a randomly chosen candidate could be out of valid CPU range due to CPU offline, etc. In this case, try another CPU randomly. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-29 02:22:18 +09:00
Ming Yang	28bfd2986a	scx_stats: Implement #stat_doc to autogen doc from stat desc The doc of scx_layered `Opt` is out of sync. Implement attribute macro #stat_doc to generate doc from the `desc` property. Apply #stat_doc to `LayerStats` and `SysStats in scx_layered. Signed-off-by : Ming Yang <minos.future@gmail.com>	2024-09-28 09:32:48 -07:00
Changwoo Min	e8ebc09ced	Merge pull request #702 from multics69/lavd-dyn-pc-thr scx_lavd: more accurately determine the performance criticality threshold	2024-09-28 04:35:45 +00:00
Changwoo Min	cd7846f4d2	scx_lavd: more accurately determine the performance criticality threshold We used the average performance criticality of tasks as a threshold to determine the proper core type (big or little). However, if the big core's compute capacity is not half of the total compute capacity, such an average-based determination becomes suboptimal. If fewer tasks are classified as performance-critical tasks and requested to run on big cores, the big cores would be wasted by stealing arbitrary non-performance-critical tasks. That could result in performance instability. Hence, determine the threshold more accurately by considering (active) big cores' compute capacity and the (approximated) distribution of performance criticality of tasks. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-27 16:56:30 +09:00
Changwoo Min	f07023e42b	scx_lavd: rename avg_perf_cri to thr_perf_cri As a preparation to improve the performance criticality logic, we first rename "avg_perf_cri" to "thr_perf_cri" since average is no longer the threshold. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-27 13:12:20 +09:00
Daniel Hodges	d1b425d1fa	scx_layered: Fix idle core selection Fix idle core selection to correctly use pick_idle_cpu_from. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-26 07:46:26 -07:00
Daniel Hodges	f9b39244cc	Merge pull request #687 from hodgesds/layered-growth-enum-refactor scx_layered: Add layer growth algo to layer bpf config	2024-09-25 15:10:00 -04:00
Daniel Hodges	bce840d9e5	scx_layered: Add layer growth algo to layer bpf config Add an enum for the layer growth algo to the bpf layer config. This will be useful for implementing topology aware layer growth algorithms. When selecting an idle CPU the current logic tries to keep tasks local to LLC/NUMA node. However, for certain growth algorithms (ex: RoundRobin) this is suboptimal. Adding the layer growth algorithm will allow for different paths for CPU selection in the idle/preemption paths. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-25 12:00:24 -07:00
likewhatevs	99d1179866	enable ide's etc. to work on bpf.c files (#668 ) * enable ide's etc. to work on the bpf.c files this makes it so that clangd and ide tools which use clangd can work on the bpf.c code. nothing should actually be changed outside of that ide/editor environment, all the changes are ifdef'ed on LSP which is set in the added .clangd file. * move intf include out of both sides of ifdef toggle	2024-09-24 16:55:02 -04:00
Daniel Hodges	2805bb77a5	Merge pull request #683 from hodgesds/layered-idle-topo scx_layered: Make layered idle CPU selection topology aware	2024-09-24 16:37:23 -04:00
Daniel Hodges	679dd5920c	scx_layered: Fix comment Fix comment to be more accurate. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 12:31:03 -07:00
Daniel Hodges	87c6e276d9	scx_layered: Restrict idle selection to layer cpus Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 09:40:38 -07:00
Daniel Hodges	e68fccd26c	scx_layered: Update comments on layer preemption Update comments on layer preemption to be more descriptive. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 08:24:04 -07:00
Daniel Hodges	6b966cda0c	scx_layered: Restrict preemption to layer cpumask When preempting restrict preemption to the current layer cpumask. This may reduce the amount of preemption, but cause better cache locality of preempted tasks. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-24 06:23:01 -07:00
I Hsin Cheng	61cb3f7fc5	scx_common_bpf: Append cast_mask() Remove cast_mask() function distributed throughout different schedulers and add it in common.bpf.h so every scheduler can reference it once they need to. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-24 16:01:19 +08:00
Changwoo Min	b1bc4033b4	Merge pull request #673 from multics69/lavd-prop-lat-cri scx_lavd: propagate waker's latency criticality to its wakee	2024-09-24 07:34:07 +09:00
Daniel Hodges	29fb647c93	scx_layered: Refactor idle core selection Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-23 12:01:42 -07:00
Daniel Hodges	380fd1f3b3	scx_layered: Make idle select topology aware Make idle CPU selection topology aware. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-23 10:10:43 -07:00
Daniel Hodges	35477970bd	scx_layered: Cleanup dump format Cleanup the dump format for topology aware dumps in scx_layered. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-23 10:02:49 -07:00
Daniel Hodges	8b14e48994	Merge pull request #671 from hodgesds/layered-last-waker scx_layered: Add waker stats per layer	2024-09-23 10:58:54 -04:00
Changwoo Min	71fa92cf1c	scx_lavd: propagate waker's latency criticality to its wakee If a waker is more latency critical than a wakee, inherit a waker's latency criticality for the wakee. This allows the wakee to consider the context of who wakes me up. For now, we limit such inheritance to one hop and one schedule. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-23 12:56:16 +09:00
Changwoo Min	ad8536b4a4	Merge pull request #670 from multics69/lavd-opt-preemption scx_lavd: find a victim cpu for preemption within task's compute domain	2024-09-23 10:22:08 +09:00
Daniel Hodges	91d32663bd	scx_layered: Refactor waker tracking to only use last waker Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 18:05:54 -04:00
Daniel Hodges	1a2f82b91c	Merge pull request #666 from hodgesds/layered-local-llc scx_layered: Add topology aware preemption	2024-09-22 17:36:32 -04:00
Daniel Hodges	326f3b7988	Merge pull request #667 from hodgesds/layered-pcore-grow scx_layered: Add Big/Little core growth algos	2024-09-22 16:59:42 -04:00
Daniel Hodges	1ac9712d2e	scx_layered: Refactor preemption into a separate function Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:54:11 -04:00
Daniel Hodges	bc34bd867b	scx_layered: Add option to enable XNUMA preemption Disable XNUMA preemption by default and add an option to enable it. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:57 -04:00
Daniel Hodges	e105d9f8b1	scx_layered: Use cast_mask helper Use the cast_mask helper to clean up some of the bpf cpumask conversion code for preemption. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:57 -04:00
Daniel Hodges	5d9d32b65c	scx_layered: Add stats for XLLC/XNUMA preemptions Add stats for XLLC/XNUMA preemptions. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:57 -04:00
Daniel Hodges	c15ecbb3a4	scx_layered: Add topology aware preemption Add topology aware preemption that begins in the local LLC and attempts to preempt from cpus nearest in the topology. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 16:52:56 -04:00
Daniel Hodges	6fb2f0b2b4	scx_layered: Clean up waker code Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 06:43:10 -04:00
Daniel Hodges	c55b2c6e69	scx_layered: Add waker stats per layered Update the task context to keep a mask of wakers and add stats for wakes across layers. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-22 06:43:03 -04:00
Daniel Hodges	140a101874	Merge pull request #449 from hodgesds/layered-dsq-fixes scx_layered: Add a hi fallback dsq per llc	2024-09-22 06:39:46 -04:00
Changwoo Min	7321a89724	scx_lavd: find a victim cpu for preemption within task's compute domain Previously, we found a victim from the entire CPUs, which include remote or non-compatible CPUs. Now we limit our search for victim finding within a task's compute domain. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 12:47:18 +09:00
Changwoo Min	a13082c2b8	Merge pull request #669 from multics69/lavd-opt-select-cpu scx_lavd: consider waker's CPU when ops.select_cpu()	2024-09-22 09:16:06 +09:00
Andrea Righi	897977bbc1	Merge pull request #663 from vax-r/bpfland_fix scx_bpfland: Remove the usage of cast_mask in bpfland_enqueue	2024-09-21 22:15:11 +02:00
Changwoo Min	8d8d8f9f61	scx_lavd: consider waker's CPU when ops.select_cpu() In case of sync wake-up, consider waker's CPU also to improve cache locality. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 01:57:49 +09:00
Daniel Hodges	4aa841de0a	scx_layered: Rename HI_FALLBACK_DSQ to HI_FALLBACK_DSQ_BASE Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-20 17:28:38 -04:00
Daniel Hodges	a3d1344293	scx_layered: Add core growth algo for core type Add core growth algos for Big/Little core support. The algos allow layers to grow layers by preferring either big or little cores first. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-20 11:50:15 -04:00
I Hsin Cheng	7799b94f07	scx_layered: Add helper function to access cpumask within bpf_cpumask Before passing "nodec->cpumas" and "cachec->cpumask" into "bpf_cpumask_test_cpu()", type conversion should be done first. Implement "cast_mask()" to convert "struct bpf_cpumask " into "const struct cpumask ". Reference from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/cpumask_common.h#n63 Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-20 20:52:03 +08:00
I Hsin Cheng	5596d5e3fe	scx_bpfland: Remove the usage of cast_mask in bpfland_enqueue The usage of cast_mask() within bpfland_enqueue aims to cast the type of "p->cpus_ptr" from "struct bpf_cpumask " to "const struct cpumask ". However, the type of "p->cpus_ptr" is already "const cpumask_t " aka "const struct cpumask ", so no conversion is needed. Passing a value of type "struct cpumask " into "struct bpf_cpumask " also leads to compiling error. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-20 20:45:09 +08:00
Daniel Hodges	8532ba3f1e	scx_layered: Fix hi fallback dsq consumption Fix hi fallback dsq consumption to only consume from the cache local hi fallback dsq. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-20 04:18:05 -04:00
I Hsin Cheng	e4bb99efc5	scx_layered: Refactor match_layer() Refactor match_layer() to prevent the compiling error caused by uninitialization of the variable "nr_match_ors" before usage. Move the checking of "nr_match_ors" after it access the value within "layer->nr_match_ors" to make sure it's initiailized successfully. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-19 22:20:03 +08:00
Andrea Righi	3f8db5783b	Merge pull request #658 from sched-ext/rustland-core-improve-cpu-selection scx_rustland_core: improve idle CPU selection API and logic	2024-09-17 22:38:15 +02:00
Andrea Righi	e6b624a97c	scx_rustland_core: improve idle CPU selection API and logic Pass enqueue flags to user-space: flags will be passed via QueuedTask.flags and can be forwarded back to BPF via DispatchedTask.flags. These flags can be also passed to BpfScheduler.select_cpu() to apply a more refined CPU selection policy. Moreover, avoid to prioritize the user-space scheduler too much and dispatch it only if there are no other tasks that needs to be dispatched in ops.dispatch(). This improves CPU utilization and enhances the fairness, robustness, and resilience of schedulers based on scx_rustland_core, particularly under stress test conditions. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-16 22:12:38 +02:00
Daniel Hodges	4f98de333d	Merge pull request #652 from JakeHillion/layer-growth-rr scx_layered: add round robin growth strategy	2024-09-16 17:34:48 +02:00
Andrea Righi	00eebaf905	scx_bpfland: refine task wakeup logic On WAKE_SYNC attempt to migrate the wakee on the same CPU as the waker if the waker is not exiting, the wakee can use the waker's CPU, the waker's L3 domain is not saturated and there are not other tasks queued to the local DSQ of the waker's CPU. This is the same logic used in scx_rusty. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-15 14:50:14 +02:00
Andrea Righi	079a53c689	scx_bpfland: get rid of preferred domain Using the turbo boosted CPUs as preferred scheduling seems to be beneficial only a very few corner cases, for example on battery-powered devices with an aggressive cpufreq governor that constantly tries to scale down the frequency (and even in this case it's probably better to not force the tasks to run on the fast CPUs, to save power). In practive the preferred domain seems to introduce more overhead than benefits overall, so let's get rid of it. This can be improved in the future adding multiple user-configurable scheduling domains. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-15 14:50:14 +02:00
Changwoo Min	95e2f4dabe	scx_lavd: boost the latency critility of kernel threads Many kernel threads performs latency critical tasks (e.g., net, gpu). In particular, AMD GPU driver runs the most part in the kernel space using kworker. Hence, treat kernel threads as if a woken up task. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-14 00:41:02 +09:00
Changwoo Min	4b4f42fce1	scx_lavd: add a short circuit for the case of no turbo core Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-13 16:02:07 +09:00
Jake Hillion	3848d87895	scx_layered: add round robin growth strategy	2024-09-12 23:27:21 +01:00
Daniel Hodges	632fcfe4ae	Merge pull request #648 from hodgesds/layered-llc-stats scx_layered: Add stats for XNUMA/XLLC migrations	2024-09-12 13:23:23 -04:00
Daniel Hodges	dde6e0c7f9	scx_utils: Add node/llc id to core topology Add ids for node/llc in the Core topology struct.	2024-09-12 10:05:02 -07:00
Daniel Hodges	aee19dd9a1	scx_layered: Add topology aware core growth selection Add topology aware core growth selection. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-12 06:48:51 -07:00
Daniel Hodges	14a19dc3ca	scx_layered: Add random layer growth algo Add a random layer growth algo. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-12 05:35:54 -07:00
Daniel Hodges	ae57f8d1f9	scx_rusty: Initialize node cpumask Initialize the node cpumask, which was previously uninitialized causing metric calculations to be wrong when attempting to lookup CPUs in the node cpumask. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-11 13:14:44 -07:00
Jake Hillion	8ca45cfa37	lint: enable cargo fmt (#643 ) Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it. Test plan: - CI lint check passes.	2024-09-11 10:03:20 +01:00
Daniel Hodges	43ec8bfe82	scx_layered: Add stats for XNUMA/XLLC migrations Add stats for XNUMA/XLLC migrations. An example of the output is shown: ``` hodgesd : util/frac= 5.4/ 0.1 load/frac= 301.0/ 0.3 tasks= 476 tot= 3168 local=97.82 wake/exp/reenq= 2.18/ 0.00/ 0.00 keep/max/busy= 0.03/ 0.00/ 0.03 kick= 0.00 yield/ign= 0.09/ 0 open_idle= 0.00 mig= 6.82 xnuma_mig= 6.82 xllc_mig= 4.86 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 2 [ 2, 4] 00000000 00000010 00001000 normal : util/frac= 28.7/ 0.7 load/frac= 101704.7/ 95.8 tasks= 2450 tot= 4660 local=99.06 wake/exp/reenq= 0.88/ 0.06/ 0.00 keep/max/busy= 1.03/ 0.00/ 0.00 kick= 0.06 yield/ign= 0.04/ 400 open_idle=15.73 mig=23.45 xnuma_mig=23.45 xllc_mig= 3.07 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.88 min_exec= 0.00/ 0.00ms cpus= 2 [ 2, 2] 00000001 00000100 00000000 excl_coll=12.55 excl_preempt= 0.00 random : util/frac= 0.0/ 0.0 load/frac= 0.0/ 0.0 tasks= 0 tot= 0 local= 0.00 wake/exp/reenq= 0.00/ 0.00/ 0.00 keep/max/busy= 0.00/ 0.00/ 0.00 kick= 0.00 yield/ign= 0.00/ 0 open_idle= 0.00 mig= 0.00 xnuma_mig= 0.00 xllc_mig= 0.00 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 0 [ 0, 0] 00000000 00000000 00000000 excl_coll= 0.00 excl_preempt= 0.00 stress-ng: util/frac= 4189.1/ 99.2 load/frac= 4200.0/ 4.0 tasks= 43 tot= 62 local= 0.00 wake/exp/reenq= 0.00/100.0/ 0.00 keep/max/busy=2433.9/177.4/ 0.00 kick=100.0 yield/ign= 3.23/ 0 open_idle= 0.00 mig=54.84 xnuma_mig=54.84 xllc_mig=35.48 affn_viol= 0.00 preempt/first/idle/fail= 0.00/ 0.00/ 0.00/ 0.00 min_exec= 0.00/ 0.00ms cpus= 4 [ 4, 4] 00000300 00030000 00000000 excl_coll= 0.00 excl_preempt= 0.00 ``` Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-10 19:53:28 -07:00
Tejun Heo	8f0cc89ee8	Merge pull request #645 from frelon/rusty-init-dom scx_rusty: init domains when calculating averages	2024-09-10 12:25:51 -10:00
Andrea Righi	e6e3579a92	Merge pull request #634 from anh0516/main scx_bpfland: Documentation consistency fix	2024-09-10 23:25:55 +02:00
Fredrik Lönnegren	f155966b77	scx_rusty: init domains when calculating averages The domains are added to the aggregator when load is added (and duty_cycle is not 0.0f64). This commit makes sure that all domains are added to the aggregator even when the calculated duty_cycle is 0. Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>	2024-09-10 21:51:41 +02:00
likewhatevs	85863d0e1c	Merge pull request #644 from hodgesds/layered-topo-order scx_layered: Pass layer spec for core growth algo	2024-09-10 14:49:37 -04:00
Daniel Hodges	5fdd257862	scx_layered: Pass layer spec for core growth algo Pass in the layer spec when determining the layer core growth algo. This should make it easier to implement layer growth algos that are spec specific. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-10 10:27:08 -07:00
Samuel Nair	c6af1aa1c8	scx_layered: Fix typo in stats	2024-09-10 08:44:57 -07:00
likewhatevs	c4c3659b6d	Merge pull request #638 from likewhatevs/remove-rlimit-dep remove dependency on rlimit.rs	2024-09-10 03:14:12 -04:00
Andrea Righi	655ed5b4c6	scx_bpfland: use sum_exec_runtime to evaluate task's used time slice Using p->scx.slice to evaluate the consumed time slice can be a bit imprecise, because the sched_ext core implements yielding by setting p->scx.slice to 0. When the task's vruntime is evaluated this is considered as the task has exhausted its entire allocated time slice, even though it voluntarily released the CPU before the slice fully expired. To avoid this inaccuracy and prevent penalizing tasks that voluntarily release the CPU, always evaluate the used time slice based on the difference in the task's total execution time (p->se.sum_exec_runtime). This method provides a more precise calculation of vruntime and results in a fairer task's deadline evaluation. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-10 08:03:35 +02:00
patso	c1df85914b	remove dependency on rlimit.rs the rlimit crate is the only dependency crate with a build.rs. build.rs files complicate portability. this removes the need for rlimit.rs	2024-09-10 01:16:53 -04:00
Tejun Heo	56bb963136	build: Use a single top-level rust workspace Rust build was using two separate workspaces - rust/ and scheds/rust. There's no reason to separate them and it makes doc generation tricky. Use single top level workspace so that we can drive all rust building from cargo.	2024-09-08 14:23:48 -10:00
patso	120211d731	split build and test jobs split build and test jobs to reduce ci turnaround time and make it clear what is failing when something fails. also add virtiofsd to deps to make test compilation faster (most test time is compliation) and remove all force 9ps.	2024-09-08 02:54:24 -04:00
Changwoo Min	17e0e08e6e	Merge pull request #621 from multics69/lavd-greedy-fix scx_lavd: improve greedy ratio calculation and more	2024-09-07 10:52:00 +09:00
Tejun Heo	6f8917ceca	Merge pull request #624 from JakeHillion/cleanup-layer_growth_algo scx_layered: clean up Layer::new layer_growth_algo	2024-09-06 15:10:41 -10:00
Avraham Hollander	f71cc646a3	scx_bpfland: Fix in README.md for the same text as a comment in the source	2024-09-06 19:12:33 -04:00
Jake Hillion	2c008b2afa	scx_layered: clean up Layer::new layer_growth_algo	2024-09-06 18:25:50 +01:00
Changwoo Min	36df970a8f	scx_lavd: add debug print for turbo cores Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 19:23:17 +09:00
Changwoo Min	351a1c6656	scx_lavd: enable autopilot mode by default Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 19:23:12 +09:00
Andrea Righi	8231f8586a	scx_rlfifo: better documentation and code readability Simplify scx_rlfifo code, add detailed documentation of the scx_rustland_core API and get rid of the additional task queue, since it just makes the code bigger, slower and it doesn't really provide any benefit (considering that we are dispatching the tasks in FIFO order anyway). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-06 11:25:24 +02:00
Andrea Righi	ed879bae28	scx_rustland_core: expose enq_flags to user-space Pass the enqueue flags to the user-space scheduler through the QueuedTask struct. These flags allow the user-space scheduler to make more informed scheduling decisions. Also bump up scx_rustland_core minor version to reflect the new API (we are not breaking the old API, so we don't need to bump the major version in this case). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-06 11:25:24 +02:00
Changwoo Min	ebe9375b6a	scx_lavd: pretty printing of status Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 16:27:20 +09:00
Changwoo Min	461cb9a3a0	scx_lavd: fix calculation of greedy_ratio The service time (taskc->svc_time) should be the sum of total CPU time consumed not jut a delta. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 16:22:40 +09:00
Tejun Heo	46fc2e1a49	version: v1.0.4	2024-09-05 18:12:45 -10:00
Tejun Heo	cd555741d0	rust: Synchronize depency versions	2024-09-05 17:10:02 -10:00
Changwoo Min	e3243c5d51	Merge pull request #612 from multics69/lavd-monitor scx_lavd: add --monitor flag and two micro-optimizations	2024-09-06 09:33:55 +09:00
Changwoo Min	d9274bd8e6	scx_lavd: drop time slice boost for big cores Unexpectedly, little cores, which have relative short time slices, have more chance to schedule performance-critical tasks. Hence it is better to keep the time slice same regardless the core types. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 09:32:38 +09:00
Changwoo Min	fdecba227c	scx_lavd: print more info with --monitor Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 09:32:31 +09:00
Daniel Hodges	0fa369b914	Merge pull request #619 from hodgesds/stats-fixes scx_layered: Fix stats typo	2024-09-05 15:44:15 -04:00
Daniel Hodges	25e1642bbc	scx_layered: Fix stats typo Small typo fix Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-05 14:12:28 -04:00
Andrea Righi	918cfc613d	scx_bpfland: optimize producer/consumer workloads When selecting an idle CPU for a task that has been woken up, prioritize reusing the same CPU if the waker and wakee share the same L3 cache. Otherwise, attempt to migrate the wakee to the waker's CPU, provided it is allowed by the wakee's scheduling domain. This seems to consistently improve FPS performance when the system is not operating over its full capacity. Example: $ __GL_SYNC_TO_VBLANK=0 vblank_mode=0 glxgears -geometry 800x600 - before: ~18305.77 FPS - after: ~19060.62 FPS Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-05 19:02:09 +02:00
Andrea Righi	28050dcd7d	Merge pull request #615 from sched-ext/bpfland-auto scx_bpfland: enable "auto" mode by default	2024-09-05 19:01:50 +02:00
Daniel Hodges	e6ed9b05ba	Merge pull request #614 from hodgesds/layered-stats-fix scx_layered: Fix stats formatting	2024-09-05 12:54:56 -04:00
Andrea Righi	844c00fd26	scx_bpfland: enable "auto" mode by default Rename "turbo domain" to "preferred domain", that conceptually is more generic and introduce the new option `--preferred-domain CPUMASK`, which allows users to define the preferred domain, specifying a cpumask as a hex number. By default ("auto") the scheduler will always try to detect and use the fastest CPUs in the system. Moreover, adjust the cpufreq logic to use "auto" both with the "balance_power" and "balance_performance" EPP profiles. Then, enable "auto" mode by default: the scheduler will try to automatically determine the optimal primary domain, preferred domain and cpufreq level, based on the selected scheduler and energy profiles. Tested-by: Piotr Gorski < piotr.gorski@cachyos.org > Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-05 16:11:12 +02:00
Daniel Hodges	76ad880475	scx_layered: Fix stats formatting Fix formatting precision of stats to have lower precision for readability. The existing formatting is hard to read: tot= 1538 local=31.27 open_idle= 2.73 affn_viol=23.80 proc=4ms busy= 1.1 util= 16.6 load= 32.7 fallback_cpu= 6 excl_coll=0.06501950585175553 excl_preempt=0.26007802340702213 excl_idle=0.16384915474642392 excl_wakeup=0.25097529258777634 With this fix stats are far more readable formatting: tot= 441 local=33.56 open_idle= 0.00 affn_viol=20.63 proc=3ms busy= 0.4 util= 6.3 load= 33.6 fallback_cpu= 6 excl_coll=0.454 excl_preempt=0.000 excl_idle=0.132 excl_wakeup=0.200 Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-09-05 06:44:54 -04:00
Changwoo Min	f490a55d54	scx_lavd: accmulate more system-wide statistics Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-05 16:03:14 +09:00
Changwoo Min	e5d27d0553	scx_lavd: print basic system status when --monior is given Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-05 16:03:14 +09:00
Changwoo Min	6b717a3f3d	scx_lavd: add --help-stats option Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-05 16:03:14 +09:00
Changwoo Min	ca1c86eb9c	scx_lavd: improve pick_idle_cpu() for pinned tasks When a pinned task cannot run on either active or overflow sets, we try to stay on the previous CPU which is still okay to run on. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-05 16:03:14 +09:00
Andrea Righi	afc7b5404b	Merge pull request #600 from sched-ext/bpfland-cpufreq scx_bpfland: improve cpufreq awareness	2024-09-05 07:32:10 +02:00
Tejun Heo	f010eda5c0	meson: Remove scheds/rust/*/meson.build These aren't used since `43950c65` ("build: Use workspace to group rust sub-projects"). Drop them.	2024-09-04 06:40:17 -10:00

... 2 3 4 5 6 ...

1144 Commits