Refactor topology preemption logic so the non topology aware code is
contianed to a separate function. This should make maintaining the non
topology aware code path far easier.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Rename the `load_adj` statistic to `load_frac_adj`, which is a more
accurate representation of what the statistic is calculating. The
statistic is a fractional representation of the load of a layer adjusted
for infeasible weights.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Refactor layered_dispatch into two functions: layered_dispatch_no_topo and
layered_dispatch. layered_dispatch will delegate to layered_dispatch_no_topo in
the disable_topology case.
Although this code doesn't run when loaded by BPF due to the global constant
bool blocking it, it makes the functions really hard to parse as a human. As
they diverge more and more it makes sense to split them into separate
manageable functions.
This is basically a mechanical change. I duplicated the existing function,
replaced all `disable_topology` with true in `no_topo` and false in the
existing function, then removed all branches which can't be hit.
Test plan:
- Runs on my dev box (6.9.0 fbkernel) with `scx_layered --run-example -n`.
- As above with `-t`.
- CI.
clang is correctly warning that we use various uninitialised variables. clean
these up so real errors are easier to read.
The largest change here is to non-topological layered_dispatch. The
matching_dsq logic seems to be incorrect. It checks whether an uninitialised
variable is 0, if it is sets it, then only uses the variable if the value is 0.
I have changed this to default to -1, then use the value if it is no longer -1.
Since per-CPU kthreads may show an inconsistent prev_cpu and/or cpumask,
dispatch them directly to local DSQ and allow to preempt the current
running task.
This allows to prevent per-CPU kthread stalls and it also helps to
prioritize them, as are usually important for system performance and
responsiveness.
Moreover, change the behavior of --local-kthreads to prioritize all
kthreads when this option is used.
This addresses issue #728.
NOTE: ideally we may want to fix this in the kernel by making sure to
always expose a consistent prev_cpu and cpumask also for kthreads, but
at the moment this change allows to prevent some annoying stalls and
performance-wise it doesn't seem to introduce any regression. In fact,
the usual gaming/fps benchmarks show even a slight improvement in
responsiveness with this change applied.
Thanks to YUBY from the CachyOS community for all the extremely valuable
help with the intensive stress tests.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Add doc comment to `CpuPool` as a quick reference for each member.
Most importantly, differentiate "cpu" and "core", as logical core and
physical core, respectively.
Signed-off-by: Ming Yang <minos.future@gmail.com>
When hotplugging CPUs in rapid succession, scx_rusty would crash with:
```
scx_bpf_error (Failed to lookup dom[4294967295]
```
The root cause is if the scheduler is restarted fast enough, a task
on a previously hotplugged CPU may not have moved off that CPU yet.
Thus, the CPU -> domain map would contain an invalid domain (u32::max)
and we would fail to lookup the domain correctly in rusty_select_cpu
for prev_cpu.
To fix this, if the CPU is offline, we do not try to allocate to the
same NUMA node (assuming hotplug is a rare operation) beyond domestic
domain. Instead we use greedy allocation - first idle, then busy - then
any CPU.
Update the idle topology selection order, the current logic is:
core architecture (big/little) -> LLC -> NUMA -> Machine
It's probably better to try to keep cache lines clean and do:
LLC -> core architecture (big/little) -> NUMA -> Machine
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>