scx/scheds/rust/scx_rustland/Cargo.toml
Andrea Righi fc889c6995 scx_rustland: replace custom allocator with buddy-alloc
Currently, the primary bottleneck in scx_rustland lies within its custom
memory allocator, which is used to prevent page faults in the user-space
scheduler.

This is pretty evident looking at perf top:

  39.95%  scx_rustland             [.] <scx_rustland::bpf::alloc::RustLandAllocator as core::alloc::global::GlobalAlloc>::alloc
   3.41%  [kernel]                 [k] _copy_from_user
   3.20%  [kernel]                 [k] __kmem_cache_alloc_node
   2.59%  [kernel]                 [k] __sys_bpf
   2.30%  [kernel]                 [k] __kmem_cache_free
   1.48%  libc.so.6                [.] syscall
   1.45%  [kernel]                 [k] __virt_addr_valid
   1.42%  scx_rustland             [.] <scx_rustland::bpf::alloc::RustLandAllocator as core::alloc::global::GlobalAlloc>::dealloc
   1.31%  [kernel]                 [k] _copy_to_user
   1.23%  [kernel]                 [k] entry_SYSRETQ_unsafe_stack

However, there's no need to reinvent the wheel here, rather than relying
on an overly simplistic and inefficient allocator, we can rely on
buddy-alloc [1], which is also capable of operating on a preallocated
memory buffer.

After switching to buddy-alloc, the performance profile under the same
workload conditions looks like the following:

   6.01%  [kernel]                 [k] _copy_from_user
   5.21%  [kernel]                 [k] __kmem_cache_alloc_node
   4.45%  [kernel]                 [k] __sys_bpf
   3.80%  [kernel]                 [k] __kmem_cache_free
   2.79%  libc.so.6                [.] syscall
   2.34%  [kernel]                 [k] __virt_addr_valid
   2.26%  [kernel]                 [k] _copy_to_user
   2.14%  [kernel]                 [k] __check_heap_object
   2.10%  [kernel]                 [k] __check_object_size.part.0
   2.02%  [kernel]                 [k] entry_SYSRETQ_unsafe_stack

With this change in place, the primary overhead is now moved to the
bpf() syscall and the copies between kernel and user-space (this could
potentially be optimized in the future using BPF ring buffers, instead
of BPF FIFO queues).

A better focus at the allocator overhead before vs after this change:

 [before]
 39.95%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::alloc
  1.42%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::dealloc

 [after]
  1.50%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::alloc
  0.76%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::dealloc

[1] https://crates.io/crates/buddy-alloc

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-11 14:33:39 +01:00

29 lines
1.1 KiB
TOML

[package]
name = "scx_rustland"
version = "0.0.3"
authors = ["Andrea Righi <andrea.righi@canonical.com>", "Canonical"]
edition = "2021"
description = "A BPF component (dispatcher) that implements the low level sched-ext functionalities and a user-space counterpart (scheduler), written in Rust, that implements the actual scheduling policy. This is used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. https://github.com/sched-ext/scx/tree/main"
license = "GPL-2.0-only"
[dependencies]
anyhow = "1.0.65"
bitvec = { version = "1.0", features = ["serde"] }
clap = { version = "4.1", features = ["derive", "env", "unicode", "wrap_help"] }
ctrlc = { version = "3.1", features = ["termination"] }
fb_procfs = "0.7.0"
hex = "0.4.3"
libbpf-rs = "0.22.0"
libc = "0.2.137"
buddy-alloc = "0.5.1"
log = "0.4.17"
ordered-float = "3.4.0"
scx_utils = { path = "../../../rust/scx_utils", version = "0.6" }
simplelog = "0.12.0"
[build-dependencies]
scx_utils = { path = "../../../rust/scx_utils", version = "0.6" }
[features]
enable_backtrace = []