Prevent triggering a critical error when a local context for a task
can't be found.
Instead, handle the error gracefully (reporting a warning in debugfs) to
enhance the robustness of the schedulers based on scx_rustland_core.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Bump up scx_rustland_core version to include this critical fix that
allows to prevent scheduler stalls:
94a3594 ("scx_rustland_core: always dispatch per-cpu kthreads directly")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Do not send per-CPU kthreads to the user-space scheduler, but always
dispatch them directly from BPF.
In specific environments, sending critical per-CPU kthreads to the
user-space scheduler can lead to potential stalls. This occurs because
the user-space scheduler might be blocked by an action that these
per-CPU kthreads need to perform, but they cannot complete their action
if the scheduler needs to schedule them, hence the deadlock.
To prevent this deadlock, always dispatch the per-CPU kthreads directly
from the BPF component, ensuring that the user-space scheduler does not
get blocked by these events.
Fixes: c0a2cfb ("scx_rustland_core: always schedule per-CPU kthreads to user-space")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The scx_rustland_core API has been redesigned recently, breaking the
compatibility with the past.
Considering that Rust crates should update their major version when the
previous API becomes incompatible [1], bump up the version to 2.0.0.
[1] https://semver.org/
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Integrate the logic used by scx_bpfland to detect turbo-boosted cores in
Topology.
Also change the logic to detect Big/Little cores in function of
base_frequency, instead of scaling_max_freq, otherwise turbo-boosted
cores in homogeneous systems may be incorrectly classified as Big.
Moreover, introduce the following new methods to Cpu to check for the
core type:
- is_turbo(): return true if the CPU is Turbo, false otherwise
- is_big(): return true if the CPU is either Turbo or Big
- is_little(): return true if the CPU is Little
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
This borrows some of the logic in scx_lavd for figuring out if a core is
a Big/Little core. If this makes sense we can add helper methods
directly on the topology to return Big/Little cores so that each
scheduler doesn't have to reinvent the same logic.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Include the FIFO example directly in the README.md, instead of linking
scx_rlfifo.
Including the example directly in the README can be more useful and
practical in those cases where internet access is not available or when
we need to distribute a more "standalone" documentation.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Introduce a vtime attribute to struct DispatchedTask that can be set by
the user-space scheduler and it'll be use by the BPF component to
dispatch the task via scx_bpf_dispatch_vtime().
In this way a user-space scheduler can decide to apply its own internal
task ordering or rely on the BPF vtime priority DSQs (or both).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Temporarily drop the RL_PREEMPT_CPU flag, we need a better way to
implement preemption in scx_rustland_core and it's not very effective at
the moment, so simply it drop it for now (it'll be re-added later in the
future in a proper way).
This change does not affect any scheduler, since RL_PREEMPT_CPU is
currently unused.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
meson build script was building each rust sub-project under rust/ and
scheds/rust/ separately. This means that each rust project is built
independently which leads to a couple problems - 1. There are a lot of
shared dependencies but they have to be built over and over again for each
proejct. 2. Concurrency management becomes sad - we either have to unleash
multiple cargo builds at the same time possibly thrashing the system or
build one by one.
We've been trying to solve this from meson side in vain. Thankfully, in
issue #546, @vimproved suggested using cargo workspace which makes the
sub-projects share the same target directory and built together by the same
cargo instance while still allowing each project to behave independently for
development and publishing purposes.
Make the following changes:
- Create two cargo workspaces - one under rust/, the other under
scheds/rust/. Each contains all rust projects underneath it.
- Don't let meson descend into rust/. These are libraries used by the rust
schedulers. No need to build them from meson. Cargo will build them as
needed.
- Change the rust_scheds build target to invoke `cargo build` in
scheds/rust/ and let cargo do its thing.
- Remove per-scheduler meson.build files and instead generate custom_targets
in scheds/rust/meson.build which invokes `cargo build -p $SCHED`.
- This changes rust binary directory. Update README and
meson-scripts/install_rust_user_scheds accordingly.
- Remove per-scheduler Cargo.lock as scheds/rust/Cargo.lock is shared by all
schedulers now.
- Unify .gitignore handling.
The followings are build times on Ryzen 3975W:
Before:
________________________________________________________
Executed in 165.93 secs fish external
usr time 40.55 mins 2.71 millis 40.55 mins
sys time 3.34 mins 36.40 millis 3.34 mins
After:
________________________________________________________
Executed in 36.04 secs fish external
usr time 336.42 secs 0.00 millis 336.42 secs
sys time 36.65 secs 43.95 millis 36.61 secs
Wallclock time is reduced 5x and CPU time 7x.
Refactor the code to hide the shutdown handling inside BpfScheduler and
simply use the exited() method to check when the scheduler is stopped.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Let's make it a bit easier to use:
- Shorten exported names by changing the prefix from ScxStats to Stats. This
should be distinctive enough and more inline with how most libraries name
their exports.
- Importing the right set of traits can be tricky. Introduce prelude module
so that importing is a bit less painful.
There is no reason to have two separate options for "verbose" and
"debug" mode. Just merge the two and always use "debug". If enabled,
increase verbosity to stdout and enable reporting BPF scheduling events
in debugfs (e.g., /sys/kernel/debug/tracing/trace_pipe).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Since scx_rustland_core enables setting a time slice on a per-task basis
during task dispatch, there's no need to maintain a global time slice in
the BPF component. Instead, a global time slice can simply be managed in
user-space, achieving the same outcome.
Therefore, drop the global slice_us property from BpfScheduler to
simplify the API.
NOTE: if a time slice is not specified for a task, SCX_SLICE_DFL will be
used by default.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The update_tasks() API is somewhat confusing, so replace it with a
clearer API, notify_complete().
This new API will return control to the BPF component and inform it
about the number of tasks still pending in the user-space scheduler.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The low-power API is a bit of a hack implemented purely in the BPF
layer, this should be better re-implemented with some concepts of
topology awareness.
Therefore, get rid of this API for now.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The current API used to notify the user-space scheduler when a task
exits is really confusing (setting a negative value in
queued_task_ctx.cpu), and it's also possible to detect task exiting
events from user-space (or check in procfs, even if it's slower).
In any case, a better API should be provided for this, so drop the
current one for now.
NOTE: this will cause additional memory usage for scx_rustland, but it
can be fixed/addressed later in a separate commit (i.e., providing a
periodic garbage collector for the unused task entries).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Instead of determining the task time slice in ops.enqueue(), refresh the
time slice immediately before the task is started on its assigned CPU in
ops.running().
This ensures to apply the exact time slice specified by the user-space
scheduler and the sched_ext core will never implicitly dispatch tasks
using SCX_SLICE_DFL.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Allow user-space scheduler to pick an idle CPU via
self.bpf.select_cpu(pid, prev_task, flags), mimicking the BPF's
select_cpu() iterface.
Also remove the full_user option and always rely on the idle selection
logic from user-space.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
And move related ops into it. This is a bit more natural and will also allow
doing other operaitons (e.g. describing stats) without launching the server.
OM labels() was called with an array which is then incorrectly interpreted
as a single label. Unpack it to list of arguments. While at it, make error
reporting a bit more robust.