scx-upstream/scheds/rust/scx_rustland
Andrea Righi 45d8b54eb9 scx_rustland: re-introduce per-CPU DSQ + a global shared DSQ
With commit c6ada25 ("scx_rustland: use custom pcpu DSQ instead of
SCX_DSQ_LOCAL{_ON}") we tried to introduce custom per-CPU DSQs, instead
of using SCX_DSQ_LOCAL and SCX_DSQ_LOCAL_ON to dispatch tasks.

This was required, because dispatching tasks using SCX_DSQ_LOCAL_ON
doesn't provide a guarantee that the cpumask, checked at dispatch time
to determine the validity of a target CPU, remains valid.

This method solved the cpumask validity issue, but unfortunately it
introduced a noticeable performance regression and a potential
starvation issue (that were probably caused by the same problem): if a
task is assigned to a CPU in select_cpu() and the scheduler decides to
dispatch it on a different CPU, the task will be added to the new CPU's
DSQ, but if no dispatch event happens there, the task may remain stuck
in the per-CPU DSQ for a long time, triggering the sched-ext watchdog
timeout that would kick out the scheduler, for example:

  12:53:28 [WARN] FAIL: IPC:CSteamEngin[7217] failed to run for 6.482s (err=1026)
  12:53:28 [INFO] Unregister RustLand scheduler

Therefore, we reverted this change with 6d89ece ("scx_rustland: dispatch
tasks only on the global DSQ"), dispatching all the tasks to the global
DSQ, completely delegating the kernel to distribute tasks among the
available CPUs.

This is not the ideal solution, because we still want to give the
possibility to the user-space scheduler to assign tasks to specific
CPUs.

Therefore, re-introduce distinct per-CPU DSQs, but also provide a global
shared DSQ. Tasks dispatched in the per-CPU DSQs are consumed from the
dispatch() callback of their corresponding CPU, tasks dispatched in the
global shared DSQ are consumed from any CPU.

In this way the BPF layer is able to provide an interface that gives
the flexibility to the user-space to dispatch a task on a specific CPU
or on the first CPU available, depending on the particular scheduler's
need.

If an invalid CPU (according to the cpumask) is selected the BPF
dispatcher will transparently redirect the task to a valid CPU, selected
using the built-in idle selection logic.

In the future we may want to improve this part, giving to the
user-space the visibility of the cpumask, in order to pick a valid CPU
in advance and in a proper synchronized way.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-01 00:33:35 +01:00
..
src scx_rustland: re-introduce per-CPU DSQ + a global shared DSQ 2024-02-01 00:33:35 +01:00
.gitignore scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
build.rs scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
Cargo.toml Bump versions 2024-01-25 09:01:23 -10:00
LICENSE scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
meson.build scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
README.md Update README.md 2024-01-10 14:47:33 -10:00
rustfmt.toml scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00

scx_rustland

This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.

Overview

scx_rustland is made of a BPF component (dispatcher) that implements the low level sched-ext functionalities and a user-space counterpart (scheduler), written in Rust, that implements the actual scheduling policy.

The BPF dispatcher is completely agnostic of the particular scheduling policy implemented in user-space. For this reason developers that are willing to use this scheduler to experiment scheduling policies should be able to simply modify the Rust component, without having to deal with any internal kernel / BPF details.

How To Install

Available as a Rust crate: cargo add scx_rustland

Typical Use Case

scx_rustland is designed to be "easy to read" template that can be used by any developer to quickly experiment more complex scheduling policies, that can be fully implemented in Rust.

Production Ready?

Not quite. For production scenarios, other schedulers are likely to exhibit better performance, as offloading all scheduling decisions to user-space comes with a certain cost.

However, a scheduler entirely implemented in user-space holds the potential for seamless integration with sophisticated libraries, tracing tools, external services (e.g., AI), etc. Hence, there might be situations where the benefits outweigh the overhead, justifying the use of this scheduler in a production environment.

Demo

scx_rustland-terraria

For this demo the scheduler includes an extra patch to impose a "time slice penalty" on new short-lived tasks. While this approach might not be suitable for general usage, it can yield significant advantages in this specific scenario.

The key takeaway is to demonstrate the ease and safety of conducting experiments like this, as we operate in user-space, and we can accomplish everything simply by modifying the Rust code, that is completely abstracted from the underlying BPF/kernel internal details.