The previous code accesses uninitialized memory in comp_preemption_info()
when called from can_task1_kick_task2() <-try_yield_current_cpu()
to test if a task 2 is a lock holder or not. However, task2 is guaranteed
not a lock holder in all its callers. So move the lock holder testing to
can_cpu1_kick_cpu2().
Signed-off-by: Changwoo Min <changwoo@igalia.com>
add retries to kernel clone step
odds are that like, when we most care about cloning a fresh upstream
(i.e. some fix was just pushed), things are going to be flakiest (i.e.
no caches anywhere), so retry 10 times ~3 minutes total between tries
when trying to clone kernel source.
When a task is enqueued, kick an idle CPU in the chosen scheduling
domain. This will reduce temporary stall time of the task by waking
up the CPU as early as possible.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
We used to give a penalty in latency linearly to the greedy ratio.
However, this impacts the greedy ratio too much in determining the
virtual deadline, especially among under-utilized tasks (< 100.0%).
Now, we treat all under-utilized tasks with the same greedy ratio
(= 100.0%). For over-utilized tasks, we give a bit milder penalty
to avoid sudden latency spikes.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Previously, contextual information—such as sync wakeup and kernel
task—was incorporated into the final latency criticality value ad hoc
by adding a constant. Instead, let's make everything proportional to
run time and waker and wakee frequencies by scaling up/down the run
time and the frequencies.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Enabling SCHED_MC in the kernel used for testing allows us to
potentially run more complext tests, simulating different CPU topologies
and have access to such topology data through the in-kernel scheduler's
information.
This can be useful as we add more topology awareness logic to the
sched_ext core (e.g., in the built-in idle CPU selection policy).
Therefore add this option to the default .config (and also fix a missing
newline at the end of the file).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
This script iterates over a list archs and generates vmlinux.h for each.
Generated files are put under the corresponding arch directory
Signed-off-by: Ming Yang <minos.future@gmail.com>
Previously, the preemption is allowed only when a task is at the
early in its time slice by using LAVD_PREEMPT_KICK_MARGIN and
LAVD_PREEMPT_TICK_MARGIN. This is not necessary any more because
the lock holder preemption can avoid harmful preemptions. So we
remove LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN and
unleash the preemption.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When calculating task's latency criticality, incorporate task's
weight into runtime, wake_freq, and wait_freq more systematically.
It looks nicer and works better under heavy load.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When a CPU is released to serve higher priority scheduler class,
requeue the tasks in a local DSQ to the global enqueue.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Currently we have an approximation of LayerKind in the BPF code with `open` on
the layer, but it is difficult/impossible to tell the difference between an
Open and a Grouped layer. Add a `kind` field to the BPF `layer` and plumb
through an enum from the Rust side.
When a task holds a lock, refill its time slice once at the
ops.dispatch() path to avoid the lock holder preemption problem.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When there is an idle CPU, direct dispatch is performed to reduce
scheduling latency. This didn't work well before, but it seems
to work well now with other tunings.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Giving more penalties to a long-running tasks helps to segregate
latency-critical tasks, which are usually short-running, to
long-running tasks, which are compute-intensive.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Rework per-arch vmlinux solution
* have per-arch directory under sched/include/arch/, in which we
maintain vmlinux.h symlink and real file
vmlinux-{kernel_ver}-g{sha1}.h. The original sched/include/vmlinux/
folder is removed.
* update meson build `-I` option to find the new vmlinux.h position
* update cargo build scripts to use the per-arch vmlinux.h for
generating bindings
* keep the original ClangInfo refactoring changes
Signed-off-by: Ming Yang <minos.future@gmail.com>
Adjust the amount of vruntime budget an idle task can accumulate in
function of its latency weight, which is derived from the average number
of voluntary context switches.
This ensures that latency-sensitive tasks naturally receive an
additional priority boost and we can get avoid scaling down the vruntime
to determine the task's deadline, making the scheduler more fair.
It also makes the scheduler more robust, now rustland can survive
intensive stress tests, such as `stress-ng --cpu-sched 64` or hackbench.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The algorithm has been evolved to decide the time slice without
tracking the system-wide load. So remove the obsolete load tracking
code.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
reset_lock_futex_boost() should be called every context switch of a
task. Otherwise, in the worst case, a task and that CPU could block
the preemption. To avoid such a situation, add missing
reset_lock_futex_boost() calls.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Building CpuPool from cache-cpu topology did not apply on arm, because
`/sys/devices/system/cpu/cpu{}/cache/index{}/id` file is unavailable.
Read CPU topology instead.
Signed-off-by: Ming Yang <minos.future@gmail.com>
Adjust some default settings after the rework done with commit 112a5d4
("scx_bpfland: rework lowlatency mode to adjust tasks priority").
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>