Commit Graph

2103 Commits

Author SHA1 Message Date
Daniel Hodges
480064b8e4
Merge pull request #851 from hodgesds/layred-timer-fix
scx_layered: Fix declarations in timer
2024-10-24 16:15:39 +00:00
Daniel Hodges
e38282d61a scx_layered: Fix declarations in timer 2024-10-24 09:09:53 -07:00
Daniel Hodges
4de4cf31e0
Merge pull request #841 from hodgesds/layered-watchdog
scx_layered: Add monitor
2024-10-24 08:55:14 +00:00
Daniel Hodges
41a612f34d scx_layered: Add monitor
Add a monitor timer for scx_layered. For now the monitor is a noop.
2024-10-24 04:49:41 -04:00
Changwoo Min
4f6947736f scx_lavd: fix uninitialized memory access comp_preemption_info()
The previous code accesses uninitialized memory in comp_preemption_info()
when called from can_task1_kick_task2() <-try_yield_current_cpu()
to test if a task 2 is a lock holder or not. However, task2 is guaranteed
not a lock holder in all its callers. So move the lock holder testing to
can_cpu1_kick_cpu2().

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-24 16:07:53 +09:00
likewhatevs
371020a096
Merge pull request #844 from likewhatevs/retry-git-clone
add retries to kernel clone step
2024-10-24 00:31:39 +00:00
Pat Somaru
302b3ac6c0
add retries to kernel clone step
add retries to kernel clone step

odds are that like, when we most care about cloning a fresh upstream
(i.e. some fix was just pushed), things are going to be flakiest (i.e.
no caches anywhere), so retry 10 times ~3 minutes total between tries
when trying to clone kernel source.
2024-10-23 19:58:21 -04:00
Tejun Heo
0b2d2f7a82
Merge pull request #829 from minosfuture/vmlinux_h_script
Add script for generating per-arch vmlinux.h files
2024-10-23 23:32:30 +00:00
Tejun Heo
9933ae74b5
Merge pull request #839 from sirlucjan/meson-minimal
Set minimal meson version to silent new warnings
2024-10-23 23:00:32 +00:00
Tejun Heo
386ae20ee7
Merge pull request #843 from CachyOS/feature/scx-loader-config
scx_loader: introduce configuration
2024-10-23 23:00:24 +00:00
Changwoo Min
a13bb8028e
Merge pull request #837 from multics69/lavd-tuning-v4
scx_lavd: various optimizations for more consistent performance
2024-10-23 22:56:31 +00:00
Vladislav Nepogodin
f2980d69af
scx_loader: introduce configuration 2024-10-24 01:36:11 +04:00
Tejun Heo
cc8633996b Revert "fix ci errors due to __str update in kfunc signature"
This reverts commit 29918c03c8.
2024-10-23 08:58:06 -10:00
Piotr Gorski
748887b058
Downgrade version to 1.2.0
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-10-23 19:24:11 +02:00
Piotr Gorski
ad37e2e4d2
Set minimal meson version to silent new warnings
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-10-23 19:15:02 +02:00
Changwoo Min
b90ecd7e8f scx_lavd: proactively kick a CPU at the ops.enqueue() path
When a task is enqueued, kick an idle CPU in the chosen scheduling
domain. This will reduce temporary stall time of the task by waking
up the CPU as early as possible.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-23 21:43:11 +09:00
Changwoo Min
731a7871d7 scx_lavd: change the greedy penalty function
We used to give a penalty in latency linearly to the greedy ratio.
However, this impacts the greedy ratio too much in determining the
virtual deadline, especially among under-utilized tasks (< 100.0%).
Now, we treat all under-utilized tasks with the same greedy ratio
(= 100.0%). For over-utilized tasks, we give a bit milder penalty
to avoid sudden latency spikes.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-23 21:42:55 +09:00
Changwoo Min
9acf950b75 scx_lavd: change how to use the context information for latency criticality
Previously, contextual information—such as sync wakeup and kernel
task—was incorporated into the final latency criticality value ad hoc
by adding a constant. Instead, let's make everything proportional to
run time and waker and wakee frequencies by scaling up/down the run
time and the frequencies.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-23 21:32:18 +09:00
Andrea Righi
9d45a35957
Merge pull request #830 from sched-ext/ci-enable-sched-mc
ci: enable SCHED_MC in the virtme-ng kernel config
2024-10-23 09:32:52 +00:00
Andrea Righi
d33e1b02b4 ci: enable SCHED_MC in the virtme-ng kernel config
Enabling SCHED_MC in the kernel used for testing allows us to
potentially run more complext tests, simulating different CPU topologies
and have access to such topology data through the in-kernel scheduler's
information.

This can be useful as we add more topology awareness logic to the
sched_ext core (e.g., in the built-in idle CPU selection policy).

Therefore add this option to the default .config (and also fix a missing
newline at the end of the file).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-23 11:18:02 +02:00
Andrea Righi
9a935c044e
Merge pull request #833 from sched-ext/ci-verbose-mode
ci: enable verbose mode when testing schedulers
2024-10-23 07:38:28 +00:00
Andrea Righi
0cbb8632d0 ci: enable verbose mode when testing schedulers
Always run schedulers in verbose mode to be able to catch the details of
potential BPF failures.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
2024-10-23 09:29:54 +02:00
likewhatevs
49dd5fbfe3
Merge pull request #834 from likewhatevs/fix-ci-verifier-strs
fix ci errors due to __str update in kfunc signature
2024-10-23 06:28:46 +00:00
Ming Yang
1425ab67ed Add script for generating per-arch vmlinux.h files
This script iterates over a list archs and generates vmlinux.h for each.
Generated files are put under the corresponding arch directory

Signed-off-by: Ming Yang <minos.future@gmail.com>
2024-10-22 23:27:15 -07:00
Pat Somaru
29918c03c8
fix ci errors due to __str update in kfunc signature 2024-10-23 02:18:26 -04:00
Changwoo Min
fdca0c04ed
Merge pull request #831 from multics69/lavd-fix-bpf-veri
scx_lavd: fix/work around a verifier error
2024-10-23 01:45:01 +09:00
Daniel Hodges
c6127e3e01
Merge pull request #832 from hodgesds/layered-timer2
scx_layered: Add timer helpers
2024-10-22 11:02:57 -04:00
Daniel Hodges
4898f5082a scx_layered: Add timer helpers
Add registry of timers and a helper for running timers.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-22 07:57:44 -07:00
Changwoo Min
6fb57643fb scx_lavd: remove the time restriction in preemption
Previously, the preemption is allowed only when a task is at the
early in its time slice by using LAVD_PREEMPT_KICK_MARGIN and
LAVD_PREEMPT_TICK_MARGIN. This is not necessary any more because
the lock holder preemption can avoid harmful preemptions. So we
remove LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN and
unleash the preemption.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-22 17:48:56 +09:00
Changwoo Min
07ed821511 scx_lavd: incorporate task's weight to latency criticality
When calculating task's latency criticality, incorporate task's
weight into runtime, wake_freq, and wait_freq more systematically.
It looks nicer and works better under heavy load.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-22 17:48:56 +09:00
Changwoo Min
47dd1b9582 scx_lavd: respect a chosen cpu even if it is not idle
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-22 17:48:56 +09:00
Changwoo Min
257a3db376 scx_lavd: add ops.cpu_release()
When a CPU is released to serve higher priority scheduler class,
requeue the tasks in a local DSQ to the global enqueue.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-22 17:48:56 +09:00
Changwoo Min
89749ecad7 scx_lavd: fix/work around a verifier error
Without this, the BPF verifier spits the following errors
with *some* version of vmlinux.h. So added +1 to work around
the problem.

---------------
; bpf_for(j, 0, 64) { @ main.bpf.c:1926
509: (bf) r1 = r8                     ; R1_w=fp-32 R8_w=fp-32 refs=66,2035
510: (b4) w2 = 0                      ; R2_w=0 refs=66,2035
511: (b4) w3 = 64                     ; R3_w=64 refs=66,2035
512: (85) call bpf_iter_num_new#104189        ; R0=scalar() fp-32=iter_num(ref_id=2048,state=active,depth=0) refs=66,2035,2048
513: (bf) r1 = r8                     ; R1=fp-32 R8=fp-32 refs=66,2035,2048
514: (85) call bpf_iter_num_next#104191 515: R0_w=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) R6=scalar(id=2047,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7)) R7=scalar() R8=fp-32 R9=map_value(map=bpf_bpf.bss,ks=4,vs=4584,off=384,smin=smin32=0,smax=umax=smax32=umax32=3968,var_off=(0x0; 0xf80)) R10=fp0 fp-16=iter_num(ref_id=66,state=active,depth=1) fp-24=iter_num(ref_id=2035,state=active,depth=1) fp-32=iter_num(ref_id=2048,state=active,depth=1) fp-80=scalar(id=1) fp-88=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) fp-96=????0 fp-112=rcu_ptr_bpf_cpumask() fp-120=rcu_ptr_bpf_cpumask() fp-128=rcu_ptr_bpf_cpumask() fp-136=rcu_ptr_bpf_cpumask() refs=66,2035,2048
; bpf_for(j, 0, 64) { @ main.bpf.c:1926
515: (15) if r0 == 0x0 goto pc+49     ; R0_w=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) refs=66,2035,2048
516: (64) w6 <<= 6                    ; R6=scalar(smin=smin32=0,smax=umax=smax32=umax32=448,var_off=(0x0; 0x1c0)) refs=66,2035,2048
517: (61) r8 = *(u32 *)(r0 +0)        ; R0=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) R8_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=66,2035,2048
518: (26) if w8 > 0x3f goto pc+46     ; R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=63,var_off=(0x0; 0x3f)) refs=66,2035,2048
; if (cpumask & 0x1LLU << j) { @ main.bpf.c:1927
519: (bf) r1 = r7                     ; R1_w=scalar(id=2053) R7=scalar(id=2053) refs=66,2035,2048
520: (7f) r1 >>= r8                   ; R1_w=scalar() R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=63,var_off=(0x0; 0x3f)) refs=66,2035,2048
521: (57) r1 &= 1                     ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) refs=66,2035,2048
522: (15) if r1 == 0x0 goto pc+38     ; R1_w=1 refs=66,2035,2048
; cpu = (i * 64) + j; @ main.bpf.c:1928
523: (4c) w8 |= w6                    ; R6=scalar(smin=smin32=0,smax=umax=smax32=umax32=448,var_off=(0x0; 0x1c0)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) refs=66,2035,2048
; bpf_cpumask_set_cpu(cpu, cd_cpumask); @ main.bpf.c:1929
524: (bc) w1 = w8                     ; R1_w=scalar(id=2054,smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) R8_w=scalar(id=2054,smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) refs=66,2035,2048
525: (79) r2 = *(u64 *)(r10 -88)      ; R2_w=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) R10=fp0 fp-88=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) refs=66,2035,2048
526: (85) call bpf_cpumask_set_cpu#93595
invalid access to map value, value_size=1320 off=1280 size=48
R2 max value is outside of the allowed memory range
processed 24200 insns (limit 1000000) max_states_per_insn 19 total_states 961 peak_states 789 mark_read 44
---------------

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-22 17:19:37 +09:00
Changwoo Min
d5b8aafa1a
Merge pull request #822 from multics69/lavd-tuning-v3
scx_lavd: misc performance tuning
2024-10-22 09:57:58 +09:00
Tejun Heo
6ea15f9f9f
Merge pull request #819 from minosfuture/vmlinux_per_arch
Use per-arch vmlinux.h v2
2024-10-21 19:36:52 +00:00
likewhatevs
303c6d09a0
Merge pull request #824 from likewhatevs/layered-exit-task-no-missing-ctx
scx_layered: fix exit_task ctx lookup err
2024-10-21 14:52:07 +00:00
6216a4b3b1
Merge pull request #826 from JakeHillion/pr826
layered: bpf: add layer kind to layer
2024-10-21 10:47:39 +00:00
Jake Hillion
55c9636f78 layered: bpf: add layer kind to layer
Currently we have an approximation of LayerKind in the BPF code with `open` on
the layer, but it is difficult/impossible to tell the difference between an
Open and a Grouped layer. Add a `kind` field to the BPF `layer` and plumb
through an enum from the Rust side.
2024-10-21 11:32:17 +01:00
Changwoo Min
5f19fa0bab scx_lavd: refill time slice once for a lock holder
When a task holds a lock, refill its time slice once at the
ops.dispatch() path to avoid the lock holder preemption problem.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-21 15:56:51 +09:00
Changwoo Min
5a852dc3d9 scx_lavd: direct dispatch when there is an idle CPU
When there is an idle CPU, direct dispatch is performed to reduce
scheduling latency. This didn't work well before, but it seems
to work well now with other tunings.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-21 15:56:51 +09:00
Changwoo Min
420de70159 scx_lavd: give more penalty to long-running tasks
Giving more penalties to a long-running tasks helps to segregate
latency-critical tasks, which are usually short-running, to
long-running tasks, which are compute-intensive.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-21 15:56:41 +09:00
Pat Somaru
d89c571593
scx_layered: do not attempt ctx lookup on tasks exited before running on scx 2024-10-20 17:47:24 -04:00
Andrea Righi
fb3f1d0b43
Merge pull request #821 from sched-ext/rustland-min-vtime-budget
scx_rustland: Adjust task's vruntime budget based on latency weight
2024-10-20 07:44:35 +00:00
Changwoo Min
bf1b014d63
Merge pull request #818 from multics69/lavd-tuning
scx_lavd: add missing reset_lock_futex_boost()
2024-10-20 01:41:54 +00:00
Daniel Hodges
e72e5ce0f4
Merge pull request #744 from minosfuture/main
scx_layered: Fix crash on aarch64 due to unavailable cache id file
2024-10-19 22:33:53 +00:00
Ming Yang
1b5359ef4a Use per-arch vmlinux.h v2
Rework per-arch vmlinux solution
* have per-arch directory under sched/include/arch/, in which we
  maintain vmlinux.h symlink and real file
  vmlinux-{kernel_ver}-g{sha1}.h. The original sched/include/vmlinux/
  folder is removed.
* update meson build `-I` option to find the new vmlinux.h position
* update cargo build scripts to use the per-arch vmlinux.h for
  generating bindings
* keep the original ClangInfo refactoring changes

Signed-off-by: Ming Yang <minos.future@gmail.com>
2024-10-19 10:50:59 -07:00
Andrea Righi
30a2a2013c scx_rustland: Adjust task's vruntime budget based on latency weight
Adjust the amount of vruntime budget an idle task can accumulate in
function of its latency weight, which is derived from the average number
of voluntary context switches.

This ensures that latency-sensitive tasks naturally receive an
additional priority boost and we can get avoid scaling down the vruntime
to determine the task's deadline, making the scheduler more fair.

It also makes the scheduler more robust, now rustland can survive
intensive stress tests, such as `stress-ng --cpu-sched 64` or hackbench.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-19 19:32:14 +02:00
Daniel Hodges
8f3b75acb9
Merge pull request #820 from hodgesds/rusty-cleanup
scx_rusty: Cleanup cpumask casting
2024-10-19 16:12:11 +00:00
Daniel Hodges
b1b76ee72a
scx_rusty: Cleanup cpumask casting
Use the cask_mask helper function to cleanup scx_rusty.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-19 12:01:36 -04:00
Changwoo Min
2fd395bbbf scx_lavd: remove unnecessary load tracking
The algorithm has been evolved to decide the time slice without
tracking the system-wide load. So remove the obsolete load tracking
code.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-19 15:39:24 +09:00