scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-28 05:30:24 +00:00

Author	SHA1	Message	Date
Jake Hillion	0f9c1a0a73	layered/timers: support verifying on older kernels and fix logic Some of the new timer code doesn't verify on older kernels like 6.9. Modify the code a little to get it verifying again. Also applies some small fixes to the logic. Error handling was a little off before and we were using the wrong key in lookups. Test plan: - CI	2024-10-25 11:31:00 +01:00
Changwoo Min	ea600d2f3b	Merge pull request #846 from multics69/lavd-issue-385 scx_lavd: fix uninitialized memory access at comp_preemption_info()	2024-10-25 01:47:20 +00:00
Pat Somaru	1e0e0d2f50	make timerlib work the best it can with tooling	2024-10-24 13:12:53 -04:00
Pat Somaru	8ab38559aa	fix lsp to work after multiarch support	2024-10-24 13:12:53 -04:00
Daniel Hodges	e38282d61a	scx_layered: Fix declarations in timer	2024-10-24 09:09:53 -07:00
Daniel Hodges	41a612f34d	scx_layered: Add monitor Add a monitor timer for scx_layered. For now the monitor is a noop.	2024-10-24 04:49:41 -04:00
Changwoo Min	4f6947736f	scx_lavd: fix uninitialized memory access comp_preemption_info() The previous code accesses uninitialized memory in comp_preemption_info() when called from can_task1_kick_task2() <-try_yield_current_cpu() to test if a task 2 is a lock holder or not. However, task2 is guaranteed not a lock holder in all its callers. So move the lock holder testing to can_cpu1_kick_cpu2(). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-24 16:07:53 +09:00
Changwoo Min	a13bb8028e	Merge pull request #837 from multics69/lavd-tuning-v4 scx_lavd: various optimizations for more consistent performance	2024-10-23 22:56:31 +00:00
Tejun Heo	cc8633996b	Revert "fix ci errors due to __str update in kfunc signature" This reverts commit `29918c03c8`.	2024-10-23 08:58:06 -10:00
Changwoo Min	b90ecd7e8f	scx_lavd: proactively kick a CPU at the ops.enqueue() path When a task is enqueued, kick an idle CPU in the chosen scheduling domain. This will reduce temporary stall time of the task by waking up the CPU as early as possible. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-23 21:43:11 +09:00
Changwoo Min	731a7871d7	scx_lavd: change the greedy penalty function We used to give a penalty in latency linearly to the greedy ratio. However, this impacts the greedy ratio too much in determining the virtual deadline, especially among under-utilized tasks (< 100.0%). Now, we treat all under-utilized tasks with the same greedy ratio (= 100.0%). For over-utilized tasks, we give a bit milder penalty to avoid sudden latency spikes. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-23 21:42:55 +09:00
Changwoo Min	9acf950b75	scx_lavd: change how to use the context information for latency criticality Previously, contextual information—such as sync wakeup and kernel task—was incorporated into the final latency criticality value ad hoc by adding a constant. Instead, let's make everything proportional to run time and waker and wakee frequencies by scaling up/down the run time and the frequencies. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-23 21:32:18 +09:00
Pat Somaru	29918c03c8	fix ci errors due to __str update in kfunc signature	2024-10-23 02:18:26 -04:00
Changwoo Min	fdca0c04ed	Merge pull request #831 from multics69/lavd-fix-bpf-veri scx_lavd: fix/work around a verifier error	2024-10-23 01:45:01 +09:00
Daniel Hodges	4898f5082a	scx_layered: Add timer helpers Add registry of timers and a helper for running timers. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-22 07:57:44 -07:00
Changwoo Min	6fb57643fb	scx_lavd: remove the time restriction in preemption Previously, the preemption is allowed only when a task is at the early in its time slice by using LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN. This is not necessary any more because the lock holder preemption can avoid harmful preemptions. So we remove LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN and unleash the preemption. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	07ed821511	scx_lavd: incorporate task's weight to latency criticality When calculating task's latency criticality, incorporate task's weight into runtime, wake_freq, and wait_freq more systematically. It looks nicer and works better under heavy load. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	47dd1b9582	scx_lavd: respect a chosen cpu even if it is not idle Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	257a3db376	scx_lavd: add ops.cpu_release() When a CPU is released to serve higher priority scheduler class, requeue the tasks in a local DSQ to the global enqueue. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	89749ecad7	scx_lavd: fix/work around a verifier error Without this, the BPF verifier spits the following errors with some version of vmlinux.h. So added +1 to work around the problem. --------------- ; bpf_for(j, 0, 64) { @ main.bpf.c:1926 509: (bf) r1 = r8 ; R1_w=fp-32 R8_w=fp-32 refs=66,2035 510: (b4) w2 = 0 ; R2_w=0 refs=66,2035 511: (b4) w3 = 64 ; R3_w=64 refs=66,2035 512: (85) call bpf_iter_num_new#104189 ; R0=scalar() fp-32=iter_num(ref_id=2048,state=active,depth=0) refs=66,2035,2048 513: (bf) r1 = r8 ; R1=fp-32 R8=fp-32 refs=66,2035,2048 514: (85) call bpf_iter_num_next#104191 515: R0_w=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) R6=scalar(id=2047,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7)) R7=scalar() R8=fp-32 R9=map_value(map=bpf_bpf.bss,ks=4,vs=4584,off=384,smin=smin32=0,smax=umax=smax32=umax32=3968,var_off=(0x0; 0xf80)) R10=fp0 fp-16=iter_num(ref_id=66,state=active,depth=1) fp-24=iter_num(ref_id=2035,state=active,depth=1) fp-32=iter_num(ref_id=2048,state=active,depth=1) fp-80=scalar(id=1) fp-88=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) fp-96=????0 fp-112=rcu_ptr_bpf_cpumask() fp-120=rcu_ptr_bpf_cpumask() fp-128=rcu_ptr_bpf_cpumask() fp-136=rcu_ptr_bpf_cpumask() refs=66,2035,2048 ; bpf_for(j, 0, 64) { @ main.bpf.c:1926 515: (15) if r0 == 0x0 goto pc+49 ; R0_w=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) refs=66,2035,2048 516: (64) w6 <<= 6 ; R6=scalar(smin=smin32=0,smax=umax=smax32=umax32=448,var_off=(0x0; 0x1c0)) refs=66,2035,2048 517: (61) r8 = (u32 )(r0 +0) ; R0=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) R8_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=66,2035,2048 518: (26) if w8 > 0x3f goto pc+46 ; R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=63,var_off=(0x0; 0x3f)) refs=66,2035,2048 ; if (cpumask & 0x1LLU << j) { @ main.bpf.c:1927 519: (bf) r1 = r7 ; R1_w=scalar(id=2053) R7=scalar(id=2053) refs=66,2035,2048 520: (7f) r1 >>= r8 ; R1_w=scalar() R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=63,var_off=(0x0; 0x3f)) refs=66,2035,2048 521: (57) r1 &= 1 ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) refs=66,2035,2048 522: (15) if r1 == 0x0 goto pc+38 ; R1_w=1 refs=66,2035,2048 ; cpu = (i * 64) + j; @ main.bpf.c:1928 523: (4c) w8 \|= w6 ; R6=scalar(smin=smin32=0,smax=umax=smax32=umax32=448,var_off=(0x0; 0x1c0)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) refs=66,2035,2048 ; bpf_cpumask_set_cpu(cpu, cd_cpumask); @ main.bpf.c:1929 524: (bc) w1 = w8 ; R1_w=scalar(id=2054,smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) R8_w=scalar(id=2054,smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) refs=66,2035,2048 525: (79) r2 = (u64 )(r10 -88) ; R2_w=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) R10=fp0 fp-88=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) refs=66,2035,2048 526: (85) call bpf_cpumask_set_cpu#93595 invalid access to map value, value_size=1320 off=1280 size=48 R2 max value is outside of the allowed memory range processed 24200 insns (limit 1000000) max_states_per_insn 19 total_states 961 peak_states 789 mark_read 44 --------------- Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:19:37 +09:00
Changwoo Min	d5b8aafa1a	Merge pull request #822 from multics69/lavd-tuning-v3 scx_lavd: misc performance tuning	2024-10-22 09:57:58 +09:00
Tejun Heo	6ea15f9f9f	Merge pull request #819 from minosfuture/vmlinux_per_arch Use per-arch vmlinux.h v2	2024-10-21 19:36:52 +00:00
likewhatevs	303c6d09a0	Merge pull request #824 from likewhatevs/layered-exit-task-no-missing-ctx scx_layered: fix exit_task ctx lookup err	2024-10-21 14:52:07 +00:00
Jake Hillion	55c9636f78	layered: bpf: add layer kind to layer Currently we have an approximation of LayerKind in the BPF code with `open` on the layer, but it is difficult/impossible to tell the difference between an Open and a Grouped layer. Add a `kind` field to the BPF `layer` and plumb through an enum from the Rust side.	2024-10-21 11:32:17 +01:00
Changwoo Min	5f19fa0bab	scx_lavd: refill time slice once for a lock holder When a task holds a lock, refill its time slice once at the ops.dispatch() path to avoid the lock holder preemption problem. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-21 15:56:51 +09:00
Changwoo Min	5a852dc3d9	scx_lavd: direct dispatch when there is an idle CPU When there is an idle CPU, direct dispatch is performed to reduce scheduling latency. This didn't work well before, but it seems to work well now with other tunings. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-21 15:56:51 +09:00
Changwoo Min	420de70159	scx_lavd: give more penalty to long-running tasks Giving more penalties to a long-running tasks helps to segregate latency-critical tasks, which are usually short-running, to long-running tasks, which are compute-intensive. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-21 15:56:41 +09:00
Pat Somaru	d89c571593	scx_layered: do not attempt ctx lookup on tasks exited before running on scx	2024-10-20 17:47:24 -04:00
Andrea Righi	fb3f1d0b43	Merge pull request #821 from sched-ext/rustland-min-vtime-budget Some checks failed build-and-test / lint (push) Has been cancelled Details build-and-test / build-kernel (push) Has been cancelled Details build-and-test / pages (push) Has been cancelled Details build-and-test / integration-test (scx_bpfland) (push) Has been cancelled Details build-and-test / integration-test (scx_lavd) (push) Has been cancelled Details build-and-test / integration-test (scx_layered) (push) Has been cancelled Details build-and-test / integration-test (scx_rlfifo) (push) Has been cancelled Details build-and-test / integration-test (scx_rustland) (push) Has been cancelled Details build-and-test / integration-test (scx_rusty) (push) Has been cancelled Details build-and-test / layered-matrix (scx_layered, --disable-topology=false) (push) Has been cancelled Details build-and-test / layered-matrix (scx_layered, --disable-topology=true) (push) Has been cancelled Details build-and-test / rust-test-core (scx_loader) (push) Has been cancelled Details build-and-test / rust-test-core (scx_rustland_core) (push) Has been cancelled Details build-and-test / rust-test-core (scx_stats) (push) Has been cancelled Details build-and-test / rust-test-core (scx_utils) (push) Has been cancelled Details build-and-test / rust-test-schedulers (scx_bpfland) (push) Has been cancelled Details build-and-test / rust-test-schedulers (scx_lavd) (push) Has been cancelled Details build-and-test / rust-test-schedulers (scx_layered) (push) Has been cancelled Details build-and-test / rust-test-schedulers (scx_rlfifo) (push) Has been cancelled Details build-and-test / rust-test-schedulers (scx_rustland) (push) Has been cancelled Details build-and-test / rust-test-schedulers (scx_rusty) (push) Has been cancelled Details bpf-next-test / build-kernel (push) Has been cancelled Details bpf-next-test / integration-test (scx_bpfland) (push) Has been cancelled Details bpf-next-test / integration-test (scx_lavd) (push) Has been cancelled Details bpf-next-test / integration-test (scx_layered) (push) Has been cancelled Details bpf-next-test / integration-test (scx_rlfifo) (push) Has been cancelled Details bpf-next-test / integration-test (scx_rustland) (push) Has been cancelled Details bpf-next-test / integration-test (scx_rusty) (push) Has been cancelled Details scx_rustland: Adjust task's vruntime budget based on latency weight	2024-10-20 07:44:35 +00:00
Changwoo Min	bf1b014d63	Merge pull request #818 from multics69/lavd-tuning Some checks are pending build-and-test / lint (push) Waiting to run Details build-and-test / build-kernel (push) Waiting to run Details build-and-test / integration-test (scx_bpfland) (push) Blocked by required conditions Details build-and-test / integration-test (scx_lavd) (push) Blocked by required conditions Details build-and-test / integration-test (scx_layered) (push) Blocked by required conditions Details build-and-test / integration-test (scx_rlfifo) (push) Blocked by required conditions Details build-and-test / integration-test (scx_rustland) (push) Blocked by required conditions Details build-and-test / integration-test (scx_rusty) (push) Blocked by required conditions Details build-and-test / layered-matrix (scx_layered, --disable-topology=false) (push) Blocked by required conditions Details build-and-test / layered-matrix (scx_layered, --disable-topology=true) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_loader) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_rustland_core) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_stats) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_utils) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_bpfland) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_lavd) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_layered) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_rlfifo) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_rustland) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_rusty) (push) Blocked by required conditions Details build-and-test / pages (push) Waiting to run Details scx_lavd: add missing reset_lock_futex_boost()	2024-10-20 01:41:54 +00:00
Daniel Hodges	e72e5ce0f4	Merge pull request #744 from minosfuture/main Some checks are pending build-and-test / lint (push) Waiting to run Details build-and-test / build-kernel (push) Waiting to run Details build-and-test / integration-test (scx_bpfland) (push) Blocked by required conditions Details build-and-test / integration-test (scx_lavd) (push) Blocked by required conditions Details build-and-test / integration-test (scx_layered) (push) Blocked by required conditions Details build-and-test / integration-test (scx_rlfifo) (push) Blocked by required conditions Details build-and-test / integration-test (scx_rustland) (push) Blocked by required conditions Details build-and-test / integration-test (scx_rusty) (push) Blocked by required conditions Details build-and-test / layered-matrix (scx_layered, --disable-topology=false) (push) Blocked by required conditions Details build-and-test / layered-matrix (scx_layered, --disable-topology=true) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_loader) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_rustland_core) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_stats) (push) Blocked by required conditions Details build-and-test / rust-test-core (scx_utils) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_bpfland) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_lavd) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_layered) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_rlfifo) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_rustland) (push) Blocked by required conditions Details build-and-test / rust-test-schedulers (scx_rusty) (push) Blocked by required conditions Details build-and-test / pages (push) Waiting to run Details scx_layered: Fix crash on aarch64 due to unavailable cache id file	2024-10-19 22:33:53 +00:00
Ming Yang	1b5359ef4a	Use per-arch vmlinux.h v2 Rework per-arch vmlinux solution * have per-arch directory under sched/include/arch/, in which we maintain vmlinux.h symlink and real file vmlinux-{kernel_ver}-g{sha1}.h. The original sched/include/vmlinux/ folder is removed. * update meson build `-I` option to find the new vmlinux.h position * update cargo build scripts to use the per-arch vmlinux.h for generating bindings * keep the original ClangInfo refactoring changes Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-19 10:50:59 -07:00
Andrea Righi	30a2a2013c	scx_rustland: Adjust task's vruntime budget based on latency weight Adjust the amount of vruntime budget an idle task can accumulate in function of its latency weight, which is derived from the average number of voluntary context switches. This ensures that latency-sensitive tasks naturally receive an additional priority boost and we can get avoid scaling down the vruntime to determine the task's deadline, making the scheduler more fair. It also makes the scheduler more robust, now rustland can survive intensive stress tests, such as `stress-ng --cpu-sched 64` or hackbench. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-19 19:32:14 +02:00
Daniel Hodges	b1b76ee72a	scx_rusty: Cleanup cpumask casting Use the cask_mask helper function to cleanup scx_rusty. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-19 12:01:36 -04:00
Changwoo Min	2fd395bbbf	scx_lavd: remove unnecessary load tracking The algorithm has been evolved to decide the time slice without tracking the system-wide load. So remove the obsolete load tracking code. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-19 15:39:24 +09:00
Changwoo Min	8d63024be7	scx_lavd: add missing reset_lock_futex_boost() reset_lock_futex_boost() should be called every context switch of a task. Otherwise, in the worst case, a task and that CPU could block the preemption. To avoid such a situation, add missing reset_lock_futex_boost() calls. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-19 15:39:18 +09:00
Ming Yang	f3f4726c09	scx_layered: Read CPU topology for building CpuPool Building CpuPool from cache-cpu topology did not apply on arm, because `/sys/devices/system/cpu/cpu{}/cache/index{}/id` file is unavailable. Read CPU topology instead. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-17 23:41:08 -07:00
Andrea Righi	48bbcd24dd	scx_bpfland: tune default settings Adjust some default settings after the rework done with commit 112a5d4 ("scx_bpfland: rework lowlatency mode to adjust tasks priority"). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-17 21:46:51 +02:00
Andrea Righi	4d68133f3b	scx_bpfland: rework lowlatency mode to adjust tasks priority Rework lowlatency mode as following: - introduce task dynamic priority: task weight multiplied by the average amount of voluntary context switches - use dynamic priority to determine task's vruntime (instead of the static task's weight) - task's minimum vruntime is evaluated in function of the dynamic priority (tasks with a higher dynamic priority can have a smaller vruntime compared to tasks with a lower dynamic priority) The dynamic priority allows to maintain a good system responsiveness also without applying the classification of tasks in "interactive" and "regular", therefore in lowlatency mode only the shared DSQ will be used (priority DSQ is disabled). Using a separate priority queue to dispatch "interactive" tasks makes the scheduler less fair, allowing latency-sensitive tasks to be prioritized even when there is a high number of tasks in the system (e.g., `stress-ng -c 1024` or similar scenarios), where relying solely on dynamic priority may not be sufficient. On the other hand, disabling the classification of "interactive" tasks results in a fairer scheduler and more predictable performance, making it better suited for soft real-time applications (e.g, audio and multimedia). Therefore, the --lowlatency option is retained to allow users to choose between more predictable performance (by disabling the interactive task classification) or a more responsive system (default). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-17 21:46:51 +02:00
Andrea Righi	d336892c71	Merge pull request #816 from sched-ext/rustland-core-update-doc scx_rustland_core: update documentation about the new API	2024-10-17 19:18:16 +00:00
Andrea Righi	a155ff2ada	scx_rustland_core: update documentation about the new API Update the documentation adding the new task statistics provided by scx_rustland_core. Fixes: `be681c7` ("scx_rustland_core: pass nvcsw, slice and dsq_vtime to user-space") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-17 19:07:51 +02:00
Jake Hillion	f1b1830512	Merge pull request #814 from JakeHillion/pr814 layered: add RandomTopo layer growth algorithm	2024-10-17 17:05:53 +00:00
Jake Hillion	1415b4a454	layered: make disable_topology arg require equals The recent changes to `disable_topology` making the arg an `Option<bool>` instead of a `bool` caused an issue with it incorrectly attaching arguments. Make the argument `require_equals` to fix this case. This is a behaviour change for anybody previously relying on `-t true`, `-t false`, `--disable-topology true`, or `--disable-topology false`. The equals syntax worked before and continues to work after, as demonstrated in the CI. Test plan: Before: ```sh $ sudo target/release/scx_layered -t f:/tmp/test.json error: invalid value 'f:/tmp/test.json' for '--disable-topology [<DISABLE_TOPOLOGY>]' [possible values: true, false] For more information, try '--help'. ``` After: ```sh $ sudo target/release/scx_layered -t f:/tmp/test.json 14:44:00 [INFO] CPUs: online/possible=176/176 nr_cores=88 14:44:00 [INFO] Disabling topology awareness ... ^CEXIT: Scheduler unregistered from user space ```	2024-10-17 15:46:30 +01:00
Jake Hillion	a0fe303b61	layered: add RandomTopo layer growth algorithm Add an additional layer growth algorithm, named 'RandomTopo'. It follows these rules: - Randomise NUMA nodes. List each core in each NUMA node before a core from another NUMA node. - Randomise LLCs within each NUMA node. List each core in each LLC before a core in a different LLC. - Randomise the core order within each LLC. This attempts to provide a relatively evenly distributed set of cores while considering topology. Unlike `Topo`, it does not require you to specify the ordering and instead generates it from the hardware, making desyncs between the config and the hardware less likely. Currently `RandomTopo` considers topology even with `--disable-topology=true`. I can see the arguments for this going both ways. On one hand requesting disable topology suggests you want no consideration of machine topology, and `RandomTopo` should decay to `Random` (which it does on single node/LLC machines anyway). On the other hand, the config explicitly specifies `RandomTopo` and should consider the topology. If anyone feels strongly I can change this to respect `disable_topology`. Test plan: ```sh $ sudo target/release/scx_layered -v f:/tmp/test.json ... 14:31:19 [DEBUG] layer: batch algo: RandomTopo core order: [47, 44, 43, 42, 40, 45, 46, 41, 38, 37, 36, 39, 34, 32, 35, 33, 54, 49, 50, 52, 51, 48, 55, 53, 68, 64, 66, 67, 70, 69, 71, 65, 9, 10, 12, 15, 14, 11, 8, 13, 59, 60, 57, 63, 62, 56, 58, 61, 2, 3, 5, 4, 0, 6, 7, 1, 86, 83, 85, 87, 84, 81, 80, 82, 20, 22, 19, 23, 21, 18, 17, 16, 30, 25, 26, 31, 28, 27, 29, 24, 78, 73, 74, 79, 75, 77, 76, 72] 14:31:19 [DEBUG] layer: immediate algo: RandomTopo core order: [45, 40, 46, 42, 47, 43, 41, 44, 80, 82, 83, 84, 85, 86, 81, 87, 13, 10, 9, 15, 14, 12, 11, 8, 36, 38, 39, 32, 34, 35, 33, 37, 7, 3, 1, 0, 2, 5, 4, 6, 53, 52, 54, 48, 50, 49, 55, 51, 76, 77, 79, 78, 73, 74, 72, 75, 71, 66, 64, 67, 70, 69, 65, 68, 24, 26, 31, 25, 28, 30, 27, 29, 58, 56, 59, 61, 57, 62, 60, 63, 16, 19, 17, 23, 22, 20, 18, 21] ... ``` This is a machine with 1 NUMA/11 LLCs with 8 cores per LLC and you can see the results are grouped by LLC but random within.	2024-10-17 15:36:00 +01:00
Daniel Hodges	b01ff79080	Merge pull request #805 from hodgesds/layered-refresh-cleanup scx_layered: Refactor refresh cpumasks	2024-10-16 19:06:15 +00:00
Andrea Righi	2ea47af4bc	Merge pull request #804 from sched-ext/rustland-fixes scx_rustland fixes and improvements	2024-10-16 18:26:03 +00:00
Tejun Heo	84d8abf913	Revert "Use per-arch vmlinux.h" This reverts commit `a23f3566e3`.	2024-10-16 06:42:28 -10:00
Tejun Heo	bd79059f1a	Revert "Add vmlinux.h for multiple arch" This reverts commit `7067092555`.	2024-10-16 06:42:18 -10:00
Dan Schatzberg	730052a0c4	Merge pull request #803 from dschatzberg/mitosis_fallback_dsq scx_mitosis: Handle pinned tasks	2024-10-16 13:26:23 +00:00
Andrea Righi	763da6ab55	scx_rlfifo: operate in a more work-conserving way Make scx_rlfifo even simpler and keep dispatching tasks even if the CPUs are all busy. This allows to better stress test the scx_rustland_core backend, by using both the per-CPU DSQs and the global shared DSQ. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	b07de1d7d5	scx_rustland: clarify EDF scheduling scx_rustland is now effectively a deadline-based scheduler and not a pure vruntime-based scheduler. Clarify this in the source code. No functional change. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	c4b6408e92	scx_rustland: smooth vruntime update Update vruntime adding the used virtual time slice of each task as soon they are scheduled. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	0b2de2c10c	scx_rustland: use built-in nvcsw metrics Use the nvcsw metric from the scx_rustland_core backend, intead of retrieving this metric in user-space via procfs. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Andrea Righi	97629178e2	scx_rustland_core: bump up version to 2.2.2 Bump up the minor version to reflect the new backward-compatible functionality added. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-16 14:06:00 +02:00
Daniel Hodges	907746745e	scx_layered: Refactor refresh cpumasks Refactor the logic for refresh cpumasks to be easy to read and verify. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-15 17:58:10 -07:00
Tejun Heo	4841df8138	Merge pull request #793 from minosfuture/vmlinux_per_arch Use per-arch vmlinux.h	2024-10-15 19:52:42 +00:00
Dan Schatzberg	96ebe6b84a	scx_mitosis: Handle pinned tasks Pinned tasks should just be routed to a fallback DSQ. kthreads are given a higher priority than non-kthreads so use two fallback DSQs. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-15 09:09:01 -07:00
Dan Schatzberg	902f41adf0	Merge pull request #799 from dschatzberg/mitosis_dispatch_no_wakeup scx_mitosis: handle enqueue() on !wakeup	2024-10-15 13:46:07 +00:00
Daniel Hodges	71d63010af	scx_layered: Refactor layer iteration Remove DSQ iter algos. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 13:13:53 -07:00
Dan Schatzberg	a17f16e4b9	scx_mitosis: handle enqueue() on !wakeup If we're not on the wakeup path, we may see enqueue() invoked without select_cpu() which will require an idle cpu lookup. In order to fix this, we refactor the idle_cpu lookup in select_cpu so it can be invoked from enqueue(). Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-14 10:13:07 -07:00
Daniel Hodges	912d6e01c1	scx_layered: Add LLC integration test Add an integration test for testing that the `llcs` field on the layer config works properly. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 07:27:29 -07:00
Daniel Hodges	ed18e43612	Merge pull request #795 from hodgesds/bpftrace-tests scx_layered: Add topology integration test	2024-10-14 12:54:54 +00:00
Daniel Hodges	e456c83536	scx_layered: Add topology integration test Add a bpftrace script that does a topology aware test. The test script runs a bpftrace script that asserts that stress-ng processes are scheduled on NUMA node 0 only. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-13 20:23:11 -07:00
Ming Yang	f7cdf08754	scx_mitosis: Fix static assertion of scx_bpf_task_cgroup failing __weak check it failed the static assertion in macro bpf_ksym_exists. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Ming Yang	7067092555	Add vmlinux.h for multiple arch Following the change of using per-arch vmlinux.h. Add it for the remaining archs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Ming Yang	a23f3566e3	Use per-arch vmlinux.h vmlinux.h is not compatible across archs. Handle this compatibility issue by * Add arch info into vmlinux.h real file name * Link vmlinux.h to the target-arch real file at build time * Use target-arch real file for scx_utils bindgen. Also refactored clang related logic into a new clang_info mod, which is shared by bpf_builder.rs and builder.rs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Changwoo Min	c1f4051a14	scx_lavd: fix int overflow in calculating avg_lat_cri u32 is not big enough to hold the sum of lat_cri in a period, so sum_lat_cri (u32) was overflown, resulting in incorrect avg_lat_cri. Change the type from u32 to u64, avoiding the interger overflow. Note that {sum/avg}_lat_cri is only for deubugging so it is irrelevant in making scheduling decisions. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:58:36 +09:00
Changwoo Min	6c9bbe66dc	scx_lavd: remove unnecessary downscaling in deadline calculation The downscaling is not necessary in calculating task's virtual deadline because virtual dealine represents only relative order in task scheduling. Hence downscaling incurs only inacuracy caused by truncation. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:41:23 +09:00
Changwoo Min	6ddc3f0a2b	scx_lavd: do not inspect scx_lavd process itself Print the task status of scx_lavd is not useful, so filter it out. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-12 17:21:08 +09:00
Andrea Righi	197dee93f4	scx_bpfland: get rid of per-CPU DSQs Using per-CPU DSQs seems to introduce more issues than benefits (potential stalls, etc.). Therefore, let's get rid of the per-CPU DSQs and use SCX_DSQ_LOCAL for tasks directly dispatched to specific CPUs. This change seems to also improve performance on 6.12 and it makes the scheduler a lot more stable and consistent. The issues will be investigated separately, providing a separate stress test scheduler, designed to stress test per-CPU DSQs. Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:15:51 +02:00
Andrea Righi	198f22656c	scx_bpfland: clarify error code returned by pick_idle_cpu() Return more meaningful error codes from pick_idle_cpu(). No functional change, just improved code readability. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	ceb4f1755f	scx_bpfland: always refill task timeslice in ops.dispatch() When a task exhausts its timeslice and no other tasks are ready to run, we automatically refill its timeslice, but only if the current CPU is a fully idle SMT core. If we don’t handle the refill, the sched_ext core will default to refilling using SCX_SLICE_DFL, which may not be optimal. To ensure better control over the task’s timeslice, always refill it when no other tasks are available to run. Fixes: `6e24fcc` ("scx_bpfland: keep tasks running on full-idle SMT cores") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	54d704ceda	scx_bpfland: pick a random idle CPU when prev_cpu is not valid Pick any random idle CPU when the previous CPU isn't valid anymore according to the task's cpumask. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Changwoo Min	836cf9faa4	Merge pull request #779 from multics69/lavd-futex-v2 scx_lavd: mitigate the lock holder preemption problem	2024-10-12 02:42:33 +00:00
Daniel Hodges	a08a76ccd6	scx_layered: Cleanup non topology path More cleanup in the non topology path to remove copy/pasta declarations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-11 10:18:34 -07:00
Jake Hillion	eb59085e61	Merge pull request #781 from JakeHillion/pr781 layered: move configuration into library component	2024-10-11 16:39:23 +00:00
Jake Hillion	52c279a469	layered: make default value for disable_topology dynamic Disable topology currently defaults to `false` (topology enabled...). Change this so that topology is enabled by default on hardware that may benefit from it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit from it. This is a slightly noisy change as we have to move ownership of the newly mutable layer specs into the `Scheduler` object (previously they were a borrow). We don't have a `Topology` object to make the default decision from until `Scheduler::init`, and I think this is because of the possibility of hot plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is potentially mutable. Test plan: - CI. Updated to be explicit about topology in both cases. Single NUMA multi-LLC machine: ``` $ scx_layered --run-example ... 13:34:01 [INFO] Topology awareness not specified, selecting enabled based on hardware ... $ scx_layered --run-example --disable-topology=true ... 13:33:41 [INFO] Disabling topology awareness ... $ scx_layered --run-example -t ... 13:33:15 [INFO] Disabling topology awareness ... $ scx_layered --run-example --disable-topology=false # none of the above messages present ``` Single NUMA single LLC machine: ``` $ scx_layered --run-example 15:33:10 [INFO] Topology awareness not specified, selecting disabled based on hardware ```	2024-10-11 17:09:07 +01:00
Jake Hillion	143a55cda1	layered: move configuration into library component Move the LayerConfig and its children from `main.rs` into `lib.rs`. This allows other tooling, such as config managers or test executors, to modify layered configs programmatically. The end goal is to move everything in `layered` except for the argument parsing into a `run_layered` function, but I haven't done it in this diff because it's a larger change. This is a common pattern in Rust projects to do as little as possible in `main.rs` for extensibility. The only change here, other than publicity and where things are located, is the signature of `CpuPool::alloc_cpus`. It previously relied on `&Layer`, and this changes it to the two elements of `Layer` it uses. This allows `Layer` to stay confined to `main.rs` (for now) to prevent scope creep in this PR. This may be inconvenient in the short term for WIPs and anyone doing non-Cargo builds (cough me), but having things split into more files should make rebases/merges easier in the long run. Test plan: - `cargo build --release` - CI.	2024-10-11 15:55:29 +01:00
Changwoo Min	648c95be9e	scx_lavd: fix incorrect task comparison for preemption Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 21:53:24 +09:00
likewhatevs	b88f567e25	Merge pull request #782 from likewhatevs/lsp-nice-util layered -- make lsp work nice on util include file	2024-10-11 12:30:19 +00:00
Pat Somaru	2b309dbbb4	make lsp work nice on util include	2024-10-11 08:06:29 -04:00
Pat Somaru	7627e1cc42	scx_layered: fix lsp etc on util.bpf.c	2024-10-11 08:02:23 -04:00
Changwoo Min	5b4b255cbb	scx_lavd: do not preempt while holding a lock When a task holds a lock, it should not yield its time slice or it should not be preempted out. In this way, we can mitigate harmful preemption of lock holders and reduce the total preemption counts. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:49:09 +09:00
Changwoo Min	bd17589a6e	scx_lavd: boost latency criticality when a task holds a lock When a lock holder exhausts its time slide, it will be re-enqueued to a DSQ waiting for shceduling while holding a lock. In this case, prioritize its latency criticality proportionally, so a lock holder would be not stuck in a DSQ for a long time, improving system-wide progress. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:48:56 +09:00
Changwoo Min	77b8e65571	scx_lavd: tracing all blocking locks and futexes Trace the acquisition and release of blocking locks for kernel and fuxtexes for user-space. This is necessary to boost a lock holder task in terms of latency and time slice. We do not boost shared lock holders (e.g., read lock in rw_semaphore) since the kernel already prioritizes the readers over writers. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 17:03:48 +09:00
Ryan Wilson	8c8250b1e2	[layered] Implement reverse weight DSQ algorithm	2024-10-10 12:53:25 -07:00
Daniel Hodges	9f60053312	Merge pull request #775 from hodgesds/layered-idle-cleanup scx_layered: Cleanup topology preempt path	2024-10-10 18:34:08 +00:00
Daniel Hodges	fb4dcf91eb	scx_layered: Change default DSQ iter algo Change the default DSQ iter algo from round robin to linear. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-10 11:10:27 -07:00
Daniel Hodges	b22e83d4d5	scx_layered: Cleanup topology preempt path Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-10 09:56:42 -07:00
Andrea Righi	d62989e462	scx_bpfland: fix cpumask initialization error In the WAKE_SYNC path lf L3 cache awareness is disabled (--disable-l3) we may hit the following error: Error: EXIT: scx_bpf_error (CPU L3 cpumask not initialized) Fix this by setting the L3 cpumask to the whole primary domain if L3 cache awareness is disabled. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-10 09:30:54 +02:00
Daniel Hodges	fe00e2c7be	scx_layered: Refactor topo preemption Refactor topology preemption logic so the non topology aware code is contianed to a separate function. This should make maintaining the non topology aware code path far easier. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 21:24:07 -04:00
Daniel Hodges	451c68b44e	scx_layered: Cleanup debug messages Cleanup debug messages to use a common prefix when the scheduler is initialized. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 19:06:28 -04:00
Daniel Hodges	81a5250d49	scx_layered: Fix verifier errors Fix verifier errors when using different DSQ iteration algorithms and cleanup some code. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 14:36:12 -07:00
Dan Schatzberg	12cf482487	Merge pull request #767 from dschatzberg/mitosis-build mitosis: Fix build	2024-10-09 19:32:35 +00:00
Dan Schatzberg	c794c389da	mitosis: apply autoformatting Apply clang-format autoformatting on the c code and cargo fmt on the rust code. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-09 10:56:27 -07:00
Jake Hillion	483a565d7f	Merge pull request #759 from JakeHillion/pr759 layered: attempt to work steal from own llc before others	2024-10-09 17:42:23 +00:00
Daniel Hodges	678c205572	Merge pull request #766 from hodgesds/layered-load-fixes scx_layered: Rename load_adj statistic	2024-10-09 17:12:24 +00:00
Jake Hillion	d9dc46b5d2	layered: attempt to work steal from own llc before others	2024-10-09 17:39:06 +01:00
Dan Schatzberg	347147b10d	mitosis: fix build Minimal changes to make sure scx_mitosis can build with the latest scx changes. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-09 08:30:15 -07:00
Daniel Hodges	30258cff1b	scx_layered: Update docs for layer_preempt_weight_disable Update docs for layer_preempt_weight_disable and layer_growth_weight_disable. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 06:37:54 -07:00
Daniel Hodges	edc673460d	scx_layered: Rename load_adj statistic Rename the `load_adj` statistic to `load_frac_adj`, which is a more accurate representation of what the statistic is calculating. The statistic is a fractional representation of the load of a layer adjusted for infeasible weights. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-09 06:23:37 -07:00
Jake Hillion	c23efb1ed3	Merge pull request #749 from JakeHillion/pr749 layered: split dispatch into no_topo version	2024-10-09 13:15:12 +00:00
Jake Hillion	19d09c3cc1	layered: split dispatch into no_topo version Refactor layered_dispatch into two functions: layered_dispatch_no_topo and layered_dispatch. layered_dispatch will delegate to layered_dispatch_no_topo in the disable_topology case. Although this code doesn't run when loaded by BPF due to the global constant bool blocking it, it makes the functions really hard to parse as a human. As they diverge more and more it makes sense to split them into separate manageable functions. This is basically a mechanical change. I duplicated the existing function, replaced all `disable_topology` with true in `no_topo` and false in the existing function, then removed all branches which can't be hit. Test plan: - Runs on my dev box (6.9.0 fbkernel) with `scx_layered --run-example -n`. - As above with `-t`. - CI.	2024-10-09 13:33:06 +01:00
Daniel Hodges	2b5829e275	Merge pull request #763 from ryantimwilson/rusty-default-weights-fix [rusty] Fix load stats when host is under-utilized	2024-10-09 12:14:51 +00:00
likewhatevs	29bb3110ec	Merge pull request #765 from likewhatevs/update-dispatch scx_layered: enable configuring layer iteration when no topo	2024-10-09 06:22:40 +00:00
Pat Somaru	8e2f195af1	enable configuring layer iteration when no topo enable configuring layer iteration order in dispatch when topology is disabled. replace some member_vptr's in that iteration with regular accesses	2024-10-09 01:53:19 -04:00
Andrea Righi	e3e381dc8e	Merge pull request #755 from sched-ext/bpfland-prevent-kthread-stall scx_bpfland: prevent per-CPU DSQ stall with per-CPU kthreads	2024-10-09 05:28:59 +00:00
Ryan Wilson	fbdb6664ec	[rusty] Fix load stats when host is under-utilized	2024-10-08 21:08:07 -07:00
Pat Somaru	c90144d761	Revert "Merge pull request #746 from likewhatevs/layered-delay" This reverts commit `2077b9a799`, reversing changes made to `eb73005d07`.	2024-10-08 22:01:05 -04:00
Daniel Hodges	e6773d43b1	scx_layered: Make stress-ng non exclusive in example Test CI hosts are VMs currently and making stress-ng exclusive may starve the host. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-08 10:49:51 -07:00
Daniel Hodges	66f967c06d	Merge pull request #756 from hodgesds/layered-example-stress scx_layered: Add stress-ng example layer	2024-10-08 15:31:44 +00:00
likewhatevs	e1f6c792fe	Merge pull request #757 from JakeHillion/pr757 layered: cleanup warnings in bpf compilation	2024-10-08 15:29:12 +00:00
Jake Hillion	85daa2be32	layered: cleanup warnings in bpf compilation clang is correctly warning that we use various uninitialised variables. clean these up so real errors are easier to read. The largest change here is to non-topological layered_dispatch. The matching_dsq logic seems to be incorrect. It checks whether an uninitialised variable is 0, if it is sets it, then only uses the variable if the value is 0. I have changed this to default to -1, then use the value if it is no longer -1.	2024-10-08 16:25:43 +01:00
Daniel Hodges	f3191afca7	scx_layered: Add stress-ng example layer Add a stress-ng example layer, which will be used for CI testing with stress-ng. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-08 07:56:54 -07:00
Andrea Righi	c8a9207371	scx_bpfland: prevent per-CPU DSQ stall with per-CPU kthreads Since per-CPU kthreads may show an inconsistent prev_cpu and/or cpumask, dispatch them directly to local DSQ and allow to preempt the current running task. This allows to prevent per-CPU kthread stalls and it also helps to prioritize them, as are usually important for system performance and responsiveness. Moreover, change the behavior of --local-kthreads to prioritize all kthreads when this option is used. This addresses issue #728. NOTE: ideally we may want to fix this in the kernel by making sure to always expose a consistent prev_cpu and cpumask also for kthreads, but at the moment this change allows to prevent some annoying stalls and performance-wise it doesn't seem to introduce any regression. In fact, the usual gaming/fps benchmarks show even a slight improvement in responsiveness with this change applied. Thanks to YUBY from the CachyOS community for all the extremely valuable help with the intensive stress tests. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-08 15:02:31 +02:00
Daniel Hodges	d7576d4b44	Merge pull request #754 from minosfuture/cpu_pool_doc scx_layered: Add doc comment to CpuPool	2024-10-08 12:22:55 +00:00
likewhatevs	2077b9a799	Merge pull request #746 from likewhatevs/layered-delay scx_layered: lighten/reduce nested loops in layered dispatch	2024-10-08 11:32:55 +00:00
Ming Yang	0dbb8c2374	scx_layered: Add doc comment to CpuPool Add doc comment to `CpuPool` as a quick reference for each member. Most importantly, differentiate "cpu" and "core", as logical core and physical core, respectively. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-07 21:48:46 -07:00
Pat Somaru	51d9e90d39	formatting	2024-10-07 18:54:30 -04:00
Pat Somaru	d2ac627942	formatting	2024-10-07 18:47:27 -04:00
Pat Somaru	3369836970	formatting	2024-10-07 18:44:44 -04:00
Pat Somaru	e0ce4711d4	flatten and simplify dispatch	2024-10-07 18:36:07 -04:00
Daniel Hodges	eb73005d07	Merge pull request #747 from hodgesds/layered-idle-order scx_layered: Update idle topology selection order	2024-10-07 20:01:38 +00:00
Ryan Wilson	a76778a4ab	scx_rusty: Fix BPF crash during CPU hotplug When hotplugging CPUs in rapid succession, scx_rusty would crash with: ``` scx_bpf_error (Failed to lookup dom[4294967295] ``` The root cause is if the scheduler is restarted fast enough, a task on a previously hotplugged CPU may not have moved off that CPU yet. Thus, the CPU -> domain map would contain an invalid domain (u32::max) and we would fail to lookup the domain correctly in rusty_select_cpu for prev_cpu. To fix this, if the CPU is offline, we do not try to allocate to the same NUMA node (assuming hotplug is a rare operation) beyond domestic domain. Instead we use greedy allocation - first idle, then busy - then any CPU.	2024-10-07 11:59:36 -07:00
Daniel Hodges	0b497d6df0	scx_layered: Update idle topology selection order Update the idle topology selection order, the current logic is: core architecture (big/little) -> LLC -> NUMA -> Machine It's probably better to try to keep cache lines clean and do: LLC -> core architecture (big/little) -> NUMA -> Machine Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 10:34:11 -07:00
Daniel Hodges	024a2aa658	scx_layered: Improve perf on non topo aware paths Improve the performance on non topology aware paths by skipping some map lookups and uneccessary initializations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 07:56:18 -07:00
Daniel Hodges	24fba4ab8d	scx_layered: Add idle smt layer configuration Add support for layer configuration for idle CPU selection. This allows layers to choose whether or not to restrict idle CPU selection to SMT idle CPUs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 06:58:54 -07:00
Daniel Hodges	2f280ac025	scx_layered: Use idle smt mask for idle selection In the non topology aware code the idle smt mask is used for finding idle cpus. Update topology aware idle selection to also use the idle smt mask. In certain benchmarks this can improve performance. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-07 05:40:59 -07:00
Daniel Hodges	30feecc5ae	Merge pull request #743 from hodgesds/layered-big-little-mask scx_layered: Add big cpumask	2024-10-07 11:05:01 +00:00
Daniel Hodges	d86638ef0b	scx_layered: Add big cpumask Add big cpumask to scx_layered and prefer selecting big idle cores when using the BigLittle growth algo. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-06 14:05:12 -04:00
Andrea Righi	9a29547e5b	scx_bpfland: rework lowlatency mode In lowlatency mode (option --lowlatency) tasks are ordered using a deadline that is evaluated as the vruntime minus a certain "bonus", determined in function of the max time slice and the average amount of voluntary context switches, to amplify the priority boost of the tasks that are voluntarily releasing the CPU (which are typically interactive). However, this method can be extremely unfair in some cases: tasks with short bursts of voluntary context switches may receive a huge priority boost, making the rest of the system almost unresponsive (see massive hackbench stress tests for example). To prevent this rework the task's deadline logic to use the vruntime and a "deadline component" that is a function of the average used time slice, scaled using a dynamic task priority (evaluated as the static task priority and the its average amount of voluntary context switches). This logic seems to prevent excessive prioritization of tasks performing short intensive bursts of voluntary context switches. It also makes lowlatency mode in scx_bpfland (somehow) more similar to the deadline logic used by scx_rusty. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-05 17:44:09 +02:00
Changwoo Min	a673dcf809	Merge pull request #736 from multics69/scx-futex-v1 scx_lavd: split main.bpf.c into multiple files	2024-10-05 13:11:15 +09:00
Pat Somaru	efabcfcdc3	Replace PID with Task Pointer in Rusty Replace PID with Task Pointer in Rusty Fixes: #610	2024-10-04 18:06:37 -04:00
Daniel Hodges	c56e60b86a	scx_layered: Add better debug output of iter algo Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 11:36:36 -07:00
Daniel Hodges	e1241d6e52	scx_layered: Cleanup layer growth weight limits Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 11:16:58 -07:00
Daniel Hodges	17f9b3f4f3	scx_layered: Cleanup layer infeasible weight calc Cleanup the calculation of the infeasible weight to not use an unneccesary collect. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 10:12:22 -07:00
Daniel Hodges	0476a10f83	scx_layered: Cleanup from code review Cleanup from code review. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 10:09:38 -07:00
Daniel Hodges	817e310a31	scx_layered: Add default dsq iter algo Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:58:26 -07:00
Daniel Hodges	7ee12091c3	scx_layered: Add DSQ iteration algo Add DSQ iteration algorithms. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:58:23 -07:00
Daniel Hodges	6929501aea	scx_layered: Refactor stats variable names Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	f066580612	scx_layered: Use dcycle for infeasible weights Fix a bug to use duty cycle for infeasible weights calculations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	c55d34c319	scx_layered: Cleanup unused metrics Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	c0c4e183f0	scx_layered: Cargo fmt Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	f3b3d4f19c	scx_layered: Add weighted layer DSQ iteration Add a flag to control DSQ iteration across layers by layer weight. This helps prevent starvation by iterating over layers with the lowest weight first. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	bd75ac8dbf	scx_layered: Add flags for growth and preemption Add two new flags `layer_preempt_weight_disable` and `layer_growth_weight_disable` to disabled preemption and layer growth when weighted layer load exceeds the configured threshold. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	e48e675cff	scx_layered: Remove LoadLedger from stats Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	2518c99bf2	scx_layered: Refactor load calculation Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	54dbf35680	scx_layered: Add weights to userspace layer config Add weights to userspace layer config. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	07be9dcf59	scx_layered: Add stats for adjusted layer weights Add stats for infeasible weights adjusted layer stats. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00
Daniel Hodges	da38d69009	scx_layered: Add layer weights Add weights to layers and use the infeasible weights crate to properly apply weights during contention to prevent starvation. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-04 09:56:37 -07:00

1 2 3 4 5 ...

1291 Commits