Introduce scx_flash (Fair Latency-Aware ScHeduler), a scheduler that
focuses on ensuring fairness among tasks and performance predictability.
This scheduler is introduced as a replacement of the "lowlatency" mode
in scx_bpfland, that has been dropped in commit 78101e4 ("scx_bpfland:
drop lowlatency mode and the priority DSQ").
scx_flash operates based on an EDF (Earliest Deadline First) policy,
where each task is assigned a latency weight. This weight is adjusted
dynamically, influenced by the task's static weight and how often it
releases the CPU before its full assigned time slice is used: tasks that
release the CPU early receive a higher latency weight, granting them
a higher priority over tasks that fully use their time slice.
The combination of dynamic latency weights and EDF scheduling ensures
responsive and stable performance, even in overcommitted systems, making
the scheduler particularly well-suited for latency-sensitive workloads,
such as multimedia or real-time audio processing.
Tested-by: Peter Jung <ptr1337@cachyos.org>
Tested-by: Piotr Gorski <piotrgorski@cachyos.org>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
We duplicate the definition of most fields in every layer kind. This makes
reading the config harder than it needs to be, and turns every simple read of a
common field into a `match` statement that is largely redundant.
Utilise `#[serde(flatten)]` to embed a common struct into each of the LayerKind
variants. Rather than matching on the type this can be directly accessed with
`.kind.common()` and `.kind.common_mut()`. Alternatively, you can extend
existing matches to match out the common parts as demonstrated in this diff
where necessary.
There is some further code cleanup that can be done in the changed read sites,
but I wanted to make it clear that this change doesn't change behaviour, so
tried to make these changes in the least obtrusive way.
Drive-by: fix the formatting of the lazy_static section in main.rs by using
`lazy_static::lazy_static`.
Test plan:
```
# main
$ cargo build --release && target/release/scx_layered --example /tmp/test_old.json
# this change
$ cargo build --release && target/release/scx_layered --example /tmp/test_new.json
$ diff /tmp/test_{old,new}.json
# no diff
```
Fix cost accounting for fallback DSQs on refresh that DSQ budgets
get refilled appropriately. Add helper functions for converting to and
from a DSQ id to a LLC budget id. During preemption a layer should check
if it is attempting to preempt from a layer that has more budget and
only preempt if the preempting layer has more budget.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
When dispatching consume from DSQs in the local LLC first before trying
remote DSQs. This should still be fair as the layer iteration order will
be maintained.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Add a helper for returning the appropriate slice duration for a layer
and replace a various instances where the slice value was being
recalculated.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
add timer based antistall to scx_layered and new flags
to enable/disable and specify seconds of delay before
it turns on.
also update ci config to make sure this verifies/runs.
Verifier in older kernels choke on function calls from sleepable progs
triggering non-sensical RCU state error:
frame1: R1_w=scalar(id=674,smin=smin32=0,smax=umax=smax32=umax32=51,var_off=(0x0; 0x3f)) R10=; return *llc_ptr;
1072: (61) r0 = *(u32 *)(r2 +0) ; frame1: R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=map_value(map=bpf_bpf.rodata,ks=4,vs=9570,off=4400,smin=smin32=0,smax=umax=smax32=umax32=204,var_off=(0x0; 0xfc)) refs=13,647
; }
1073: (95) exit
bpf_rcu_read_unlock is missing
processed 10663 insns (limit 1000000) max_states_per_insn 8 total_states 615 peak_states 281 mark_read 20
-- END PROG LOAD LOG --
Work around by adding and using __always_inline variant of cpu_to_llc_id()
from layered_init(). Note that we can't switch everyone to __always_inline
as that can lead to verification failure due to ins limit.
When selecting the idle CPU use the idle_smt option on the layer. This
may improve cache locality in some cases by placing tasks on CPUs that
are on closer cache lines.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
On some older kernels layered fails to validate. Prevent certain helpers
from being inlined to pass the verifier.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
The loops in topology aware mode were recently refactored to place the -per-LLC
loops inside the per-layer loops. However, the layer specific checks were left
in the inner loops, slowing this down unnecessarily.
Pull the layer specific checks from the inner loop into the outer loop.
Also changes these functions to `__weak` to ensure they don't get inlined -
they're expected to be verified as global functions.
Note to reviewers: this looks good to me, but I'd appreciate if you reviewed
the De Morgan applications in detail.
Test plan:
- `cargo build --release && sudo target/release/scx_layered --run-example` on a
machine with multiple LLCs. It's possible to stall it quite easily with
stress-ng but I believe this is the case on main.
With the recent rework of scx_bpfland the default options for the
different profiles in scx_loader are not valid anymore.
Update them with some appropriate options.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Add fallback DSQ cost accounting so that fallback DSQ costs are
accounted for and so that dispatch of fallback DSQs can be done in a
standardized way.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
The verifier error seems to stem from the wrong vmlinux.h.
Also, PR #889 seems to completely fix the problem.
So, drop the workaround.
Signed-off-by: Changwoo Min <changwoo@igalia.com>