With the recent rework of scx_bpfland the default options for the
different profiles in scx_loader are not valid anymore.
Update them with some appropriate options.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Rework per-arch vmlinux solution
* have per-arch directory under sched/include/arch/, in which we
maintain vmlinux.h symlink and real file
vmlinux-{kernel_ver}-g{sha1}.h. The original sched/include/vmlinux/
folder is removed.
* update meson build `-I` option to find the new vmlinux.h position
* update cargo build scripts to use the per-arch vmlinux.h for
generating bindings
* keep the original ClangInfo refactoring changes
Signed-off-by: Ming Yang <minos.future@gmail.com>
Building CpuPool from cache-cpu topology did not apply on arm, because
`/sys/devices/system/cpu/cpu{}/cache/index{}/id` file is unavailable.
Read CPU topology instead.
Signed-off-by: Ming Yang <minos.future@gmail.com>
Update the documentation adding the new task statistics provided by
scx_rustland_core.
Fixes: be681c7 ("scx_rustland_core: pass nvcsw, slice and dsq_vtime to user-space")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
With user-space scheduling we don't usually dispatch a task immediately
after selecting an idle CPU, so there's not much benefit at trying to
optimize the WAKE_SYNC scenario (when a task is waking up another task
and releaing the CPU) when picking an idle CPU.
Therefore, get rid of the WAKE_SYNC logic in select_cpu() and rely on
the user-space logic (that has access to the WAKE_SYNC information) to
handle this particular case.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Do not kick a CPU from rs_select_cpu() (called by the user-space
scheduler), since we may not immediately dispatch the task.
Instead, always try to wake up the task's assigned CPU after dispatching
to a global DSQ, ensuring it can be consumed immediately.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Prevent CPUs from going idle when the user-space scheduler has some
pending activities to complete.
Keeping the CPU alive allows to consume tasks from the user-space
scheduler more efficiently, preventing bubbles in the scheduling
pipeline.
To achieve this, trigger a CPU kick from ops.update_idle() and set a
flag in the CPU context to prevent it from going idle. Then keep kicking
the CPU from ops.dispatch() until the flag is cleared, which occurs when
no more tasks are pending or when the CPU exits idle as a task starts
running on it.
This allows to fix the performance regression introduced by the
put_prev_task_scx() behavior change in Linux 6.12 (see #788).
Link: https://lore.kernel.org/lkml/20241015111539.12136-1-andrea.righi@linux.dev/
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
User-space schedulers may still hit some stalls during CPU hotplugging
events.
There is no reason to overcomplicate the code and trying to handle
hotplug events within the scx_rustland_core framework and we can simply
handle a scheduler restart performed by the scx core.
This makes CPU hotplugging more reliable with scx_rustland_core-based
schedulers.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Assign an infinite time slice to the user-space scheduler itself, so
that it can completely drain all the pending tasks and voluntarily
release the CPU when it's done.
This allows to achieve more consistent performance and we can also
remove the speculative user-space scheduler wakeup from ops.stopping().
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Provide additional task metrics to user-space schedulers via QueuedTask:
- nvcsw: total amount of voluntary context switches
- slice: task time slice "budget" (from p->scx.slice)
- dsq_vtime: current task vtime (from p->scx.dsq_vtime)
In this way user-space schedulers can quickly access these metrics to
implement better scheduling policy.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
vmlinux.h is not compatible across archs.
Handle this compatibility issue by
* Add arch info into vmlinux.h real file name
* Link vmlinux.h to the target-arch real file at build time
* Use target-arch real file for scx_utils bindgen.
Also refactored clang related logic into a new clang_info mod, which is
shared by bpf_builder.rs and builder.rs.
Signed-off-by: Ming Yang <minos.future@gmail.com>
The symbol __handle_mm_fault isn't available anymore in 6.12, let's rely
on handle_mm_fault that is available both on 6.12 and older kernels.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Add big cpumask to scx_layered and prefer selecting big idle cores when
using the BigLittle growth algo.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
It seems that with the latest kernel the per-CPU DSQ stall while
executing sched_setaffinity() doesn't happen anymore.
Therefore, get rid of the temporary workaround introduced by commit
86db45f ("scx_rustland_core: prevent deadlock with per-CPU DSQs and CPU
affinity") and restore the old behavior, which offers more fair
scheduling policy.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The doc of scx_layered `Opt` is out of sync.
Implement attribute macro #stat_doc to generate doc from the `desc`
property.
Apply #stat_doc to `LayerStats` and `SysStats in scx_layered.
Signed-off-by : Ming Yang <minos.future@gmail.com>
The gpu-topology feature can be enabled to include GPUs when generating
a topology-map. Disabling the feature will remove the nvml-wrapper
dependency as well as GPU-specific code in topology.rs.
Most of the code was moved to a new module in rust/scx_utils/src/gpu.rs
but some of it was kept in topology.rs and hidden behind #[cfg(feature =
"gpu-topology")].
Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>
* enable ide's etc. to work on the bpf.c files
this makes it so that clangd and ide tools which use clangd
can work on the bpf.c code.
nothing should actually be changed outside of that ide/editor
environment, all the changes are ifdef'ed on LSP which is set
in the added .clangd file.
* move intf include out of both sides of ifdef toggle
Remove cast_mask() function distributed throughout different schedulers
and add it in common.bpf.h so every scheduler can reference it once they
need to.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
This reverts commit 809d39aa7f.
Dispatching all kthreads directly doesn't really help much at preventing
stalls with the stress-ng fork stressor, so revert this commit. A better
workaround will be provided in the next commit.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Exposes an option --monitor-no-dbus in scx_loader that will monitor CPU
utilization and start scx_lavd when any CPU exceeds 90% for more than 5
seconds. scx_lavd will be terminated if all CPUs are below 90% for
more than 30 seconds. When this flag is specified, scx_loader's
dbus functionality is not utilized.
Add extra ordering macros for Core/CPU structs for ease of use with
Rust standard library features. This issue was hit when trying to sort
cores based on the CoreType. See this similar issue for details:
https://github.com/rust-lang/rust/issues/113550
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>