In order to use the new consume_raw() API we need to depend on a version
of libbpf-rs that is not released yet.
Apparently adding such dependency may introduce a potential dependency
conflict with libbpf-sys.
Therefore, revert this change and go back to the previous consume() API.
One a new version of libbpf-rs will be out we can update all our
dependencies to use the new libbpf-rs and re-apply this patch to
scx_rustland_core.
Fixes: 7c8c5fd ("scx_rustland_core: use new consume_raw() libbpf-rs API")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Use the new consume_raw() API provided by libbpf-rs with
https://github.com/libbpf/libbpf-rs/pull/680.
This allows to be more precise and efficient at processing tasks
consumed from the BPF ring buffer.
NOTE: the new consume_raw() API is not available yet in any official
release of the libbpf-rs crate, but cargo allows to pick versions
directly from git. This slightly increases the build time of
scx_rustland_core and the schedulers based on this crate (since we need
to recompile libbpf-rs from source), but we can re-add a proper
versioned dependency once the libbpf-rs is out.
TODO: this new API also offers the possibility to consume multiple items
from the BPF ring buffer with a single call to consume_raw(). This could
be investigated and implemented as a potential future enhancement.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a wrapper to scx_utils::BpfBuilder that can be used to build
the BPF component provided by scx_rustland_core.
The source of the BPF components (main.bpf.c) is included in the crate
as an array of bytes, the content is then unpacked in a temporary file
to perform the build.
The RustLandBuilder() helper is also used to generate bpf.rs (that
implements the low-level user-space Rust connector to the BPF
commponent).
Schedulers based on scx_rustland_core can simply use RustLandBuilder(),
to build the backend provided by scx_rustland_core.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a separate crate (scx_rustland_core) that can be used to
implement sched-ext schedulers in Rust that run in user-space.
This commit only provides the basic layout for the new crate and the
abstraction to the custom allocator.
In general, any scheduler that has a user-space component needs to use
the custom allocator to prevent potential deadlock conditions, caused by
page faults (a kthread needs to run to resolve the page fault, but the
scheduler is blocked waiting for the user-space page fault to be
resolved => deadlock).
However, we don't want to necessarily enforce this constraint to all the
existing Rust schedulers, some of them may do all user-space allocations
in safe paths, hence the separate scx_rustland_core crate.
Merging this code in scx_utils would force all the Rust schedulers to
use the custom allocator.
In a future commit the scx_rustland backend will be moved to
scx_rustland_core, making it a totally generic BPF scheduler framework
that can be used to implement user-space schedulers in Rust.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Currently, the primary bottleneck in scx_rustland lies within its custom
memory allocator, which is used to prevent page faults in the user-space
scheduler.
This is pretty evident looking at perf top:
39.95% scx_rustland [.] <scx_rustland::bpf::alloc::RustLandAllocator as core::alloc::global::GlobalAlloc>::alloc
3.41% [kernel] [k] _copy_from_user
3.20% [kernel] [k] __kmem_cache_alloc_node
2.59% [kernel] [k] __sys_bpf
2.30% [kernel] [k] __kmem_cache_free
1.48% libc.so.6 [.] syscall
1.45% [kernel] [k] __virt_addr_valid
1.42% scx_rustland [.] <scx_rustland::bpf::alloc::RustLandAllocator as core::alloc::global::GlobalAlloc>::dealloc
1.31% [kernel] [k] _copy_to_user
1.23% [kernel] [k] entry_SYSRETQ_unsafe_stack
However, there's no need to reinvent the wheel here, rather than relying
on an overly simplistic and inefficient allocator, we can rely on
buddy-alloc [1], which is also capable of operating on a preallocated
memory buffer.
After switching to buddy-alloc, the performance profile under the same
workload conditions looks like the following:
6.01% [kernel] [k] _copy_from_user
5.21% [kernel] [k] __kmem_cache_alloc_node
4.45% [kernel] [k] __sys_bpf
3.80% [kernel] [k] __kmem_cache_free
2.79% libc.so.6 [.] syscall
2.34% [kernel] [k] __virt_addr_valid
2.26% [kernel] [k] _copy_to_user
2.14% [kernel] [k] __check_heap_object
2.10% [kernel] [k] __check_object_size.part.0
2.02% [kernel] [k] entry_SYSRETQ_unsafe_stack
With this change in place, the primary overhead is now moved to the
bpf() syscall and the copies between kernel and user-space (this could
potentially be optimized in the future using BPF ring buffers, instead
of BPF FIFO queues).
A better focus at the allocator overhead before vs after this change:
[before]
39.95% scx_rustland [.] core::alloc::global::GlobalAlloc>::alloc
1.42% scx_rustland [.] core::alloc::global::GlobalAlloc>::dealloc
[after]
1.50% scx_rustland [.] core::alloc::global::GlobalAlloc>::alloc
0.76% scx_rustland [.] core::alloc::global::GlobalAlloc>::dealloc
[1] https://crates.io/crates/buddy-alloc
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
After updates to reflect the updated init and direct dispatch API, the
schedulers aren't compatible with older kernels. Bump versions and publish
releases.
Rename scx_rustlite to scx_rustland to better represent the mirroring of
scx_userland (in C), but implemented in Rust.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>