scx-upstream/scheds/rust/scx_rustland
Andrea Righi 93dc615653 scx_rustland: use a ring buffer for queued tasks
Switch from a BPF_MAP_TYPE_QUEUE to a BPF_MAP_TYPE_RINGBUF to store the
tasks that need to be processed by the user-space scheduler.

A ring buffer allows to save a lot of memory copies and syscalls, since
the memory is directly shared between the BPF and the user-space
components.

Performance profile before this change:

  2.44%  [kernel]  [k] __memset
  2.19%  [kernel]  [k] __sys_bpf
  1.59%  [kernel]  [k] __kmem_cache_alloc_node
  1.00%  [kernel]  [k] _copy_from_user

After this change:

  1.42%  [kernel]  [k] __memset
  0.14%  [kernel]  [k] __sys_bpf
  0.10%  [kernel]  [k] __kmem_cache_alloc_node
  0.07%  [kernel]  [k] _copy_from_user

Both the overhead of sys_bpf() and copy_from_user() are reduced by a
factor of ~15x now (only the dispatch path is using sys_bpf() now).

NOTE: despite being very effective, the current implementation is a bit
of a hack. This is because the present ring buffer API exclusively
permits consumption in a greedy manner, where multiple items can be
consumed simultaneously. However, libbpf-rs does not provide precise
information regarding the exact number of items consumed. By utilizing a
more refined libbpf-rs API [1] we may be able to improve this code a
bit.

Moreover, libbpf-rs doesn't provide an API for the user_ring_buffer, so
at the moment there's not a trivial way to apply the same change to the
dispatched tasks.

However, just with this change applied, the overhead of sys_bpf() and
copy_from_user() is already minimal, so we won't get much benefits by
changing the dispatch path to use a BPF ring buffer.

[1] https://github.com/libbpf/libbpf-rs/pull/680

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-20 12:30:22 +01:00
..
src scx_rustland: use a ring buffer for queued tasks 2024-02-20 12:30:22 +01:00
.gitignore scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
build.rs scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
Cargo.toml scx_rustland: replace custom allocator with buddy-alloc 2024-02-11 14:33:39 +01:00
LICENSE scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
meson.build scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
README.md Update README.md 2024-01-10 14:47:33 -10:00
rustfmt.toml scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00

scx_rustland

This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.

Overview

scx_rustland is made of a BPF component (dispatcher) that implements the low level sched-ext functionalities and a user-space counterpart (scheduler), written in Rust, that implements the actual scheduling policy.

The BPF dispatcher is completely agnostic of the particular scheduling policy implemented in user-space. For this reason developers that are willing to use this scheduler to experiment scheduling policies should be able to simply modify the Rust component, without having to deal with any internal kernel / BPF details.

How To Install

Available as a Rust crate: cargo add scx_rustland

Typical Use Case

scx_rustland is designed to be "easy to read" template that can be used by any developer to quickly experiment more complex scheduling policies, that can be fully implemented in Rust.

Production Ready?

Not quite. For production scenarios, other schedulers are likely to exhibit better performance, as offloading all scheduling decisions to user-space comes with a certain cost.

However, a scheduler entirely implemented in user-space holds the potential for seamless integration with sophisticated libraries, tracing tools, external services (e.g., AI), etc. Hence, there might be situations where the benefits outweigh the overhead, justifying the use of this scheduler in a production environment.

Demo

scx_rustland-terraria

For this demo the scheduler includes an extra patch to impose a "time slice penalty" on new short-lived tasks. While this approach might not be suitable for general usage, it can yield significant advantages in this specific scenario.

The key takeaway is to demonstrate the ease and safety of conducting experiments like this, as we operate in user-space, and we can accomplish everything simply by modifying the Rust code, that is completely abstracted from the underlying BPF/kernel internal details.