Add big cpumask to scx_layered and prefer selecting big idle cores when
using the BigLittle growth algo.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
It seems that with the latest kernel the per-CPU DSQ stall while
executing sched_setaffinity() doesn't happen anymore.
Therefore, get rid of the temporary workaround introduced by commit
86db45f ("scx_rustland_core: prevent deadlock with per-CPU DSQs and CPU
affinity") and restore the old behavior, which offers more fair
scheduling policy.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The doc of scx_layered `Opt` is out of sync.
Implement attribute macro #stat_doc to generate doc from the `desc`
property.
Apply #stat_doc to `LayerStats` and `SysStats in scx_layered.
Signed-off-by : Ming Yang <minos.future@gmail.com>
The gpu-topology feature can be enabled to include GPUs when generating
a topology-map. Disabling the feature will remove the nvml-wrapper
dependency as well as GPU-specific code in topology.rs.
Most of the code was moved to a new module in rust/scx_utils/src/gpu.rs
but some of it was kept in topology.rs and hidden behind #[cfg(feature =
"gpu-topology")].
Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>
* enable ide's etc. to work on the bpf.c files
this makes it so that clangd and ide tools which use clangd
can work on the bpf.c code.
nothing should actually be changed outside of that ide/editor
environment, all the changes are ifdef'ed on LSP which is set
in the added .clangd file.
* move intf include out of both sides of ifdef toggle
Remove cast_mask() function distributed throughout different schedulers
and add it in common.bpf.h so every scheduler can reference it once they
need to.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
This reverts commit 809d39aa7f.
Dispatching all kthreads directly doesn't really help much at preventing
stalls with the stress-ng fork stressor, so revert this commit. A better
workaround will be provided in the next commit.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Exposes an option --monitor-no-dbus in scx_loader that will monitor CPU
utilization and start scx_lavd when any CPU exceeds 90% for more than 5
seconds. scx_lavd will be terminated if all CPUs are below 90% for
more than 30 seconds. When this flag is specified, scx_loader's
dbus functionality is not utilized.
Add extra ordering macros for Core/CPU structs for ease of use with
Rust standard library features. This issue was hit when trying to sort
cores based on the CoreType. See this similar issue for details:
https://github.com/rust-lang/rust/issues/113550
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Use an "_" variable to access the returned valued of "saturating_sub()"
to mute the compilation warnings.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Dispatching kthreads via user-space can still lead to deadlocks in
certain cases (for example we can still trigger stalls by running the
fork stressor via stress-ng).
To prevent such stalls simply dispatch kthreads directly from BPF for
now to prevent failures.
In the future we may consider to provide an API to restrict the
selection of tasks directly dispatched (for example passing a mask PF_*
flags to "whitelist" the tasks that are allowed to bypass the user-space
scheduler).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Updating nr_queued in a non-atomic when a queued task is consumed can
lead to underflows. We don't really care about being 100% accurate here,
since nr_queued should be considered more of a statistic than an
accurate value.
Therefore, just accept the fact that nr_queued can be inaccurate and
handle potential underflows.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
If a task that is executing sched_setaffinity() is dispatched on a
per-CPU DSQ it may stall the DSQ completely, since the task won't be
able to be consumed from the corresponding CPU.
This can be easily triggered running the following stress test:
$ stress-ng --aggressive -c (nproc) -f (nproc)
From the stall trace we can see something like the following:
R stress-ng[2648662] -6880ms
scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
sticky/holding_cpu=-1/-1 dsq_id=0x5 dsq_vtime=0
cpus=ff
__set_cpus_allowed_ptr+0x1c8/0x260
__sched_setaffinity+0x105/0x1c0
sched_setaffinity+0x1ed/0x2d0
__x64_sys_sched_setaffinity+0xa5/0x100
do_syscall_64+0x82/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This should probably be addressed in the core sched_ext, but for now
prevent this deadlock by tracking when a task is executing
sched_setaffinity() and automatically bounce those tasks to the shared
DSQ (that can be consumed from any CPU).
This should solve all the recent CI failures with the scx_rustland_core
schedulers.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Pass enqueue flags to user-space: flags will be passed via
QueuedTask.flags and can be forwarded back to BPF via
DispatchedTask.flags.
These flags can be also passed to BpfScheduler.select_cpu() to apply a
more refined CPU selection policy.
Moreover, avoid to prioritize the user-space scheduler too much and
dispatch it only if there are no other tasks that needs to be dispatched
in ops.dispatch().
This improves CPU utilization and enhances the fairness, robustness, and
resilience of schedulers based on scx_rustland_core, particularly under
stress test conditions.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
On Python versions that perform validation of this line it fails because
of a square bracket mismatch. This is due to the single quotes being
parsed first. Fix by changing the outer string to double quotes.
Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it.
Test plan:
- CI lint check passes.
The domains are added to the aggregator when load is added (and
duty_cycle is not 0.0f64).
This commit makes sure that all domains are added to the aggregator even
when the calculated duty_cycle is 0.
Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>
Rust build was using two separate workspaces - rust/ and scheds/rust.
There's no reason to separate them and it makes doc generation tricky. Use
single top level workspace so that we can drive all rust building from
cargo.
Move included artifacts in the lib section (without this change we can't
publish the crate on crates.io).
Fixes: 082bccb ("fix/enable rust tests, make build faster")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
This commit fixes rust tests and configures ci to
run them on commit. It also sets up CI to run those
in a timely manner by caching dependencies and splitting jobs.
Introduce some concepts of cache awareness (taken from scx_bpfland).
This change is transparently affecting the BpfScheduler::select_cpu()
API, even if the particular cache topology is not (yet) exposed to the
user-space.
When calling select_cpu() the backend will prioritize idle CPUs that
share the same L2 cache, then the same L3 cache, considering the target
CPU specified in select_cpu() (usually the task's previously used CPU).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Pass the enqueue flags to the user-space scheduler through the
QueuedTask struct.
These flags allow the user-space scheduler to make more informed
scheduling decisions.
Also bump up scx_rustland_core minor version to reflect the new API (we
are not breaking the old API, so we don't need to bump the major version
in this case).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>