scx-upstream/scheds
Andrea Righi fc889c6995 scx_rustland: replace custom allocator with buddy-alloc
Currently, the primary bottleneck in scx_rustland lies within its custom
memory allocator, which is used to prevent page faults in the user-space
scheduler.

This is pretty evident looking at perf top:

  39.95%  scx_rustland             [.] <scx_rustland::bpf::alloc::RustLandAllocator as core::alloc::global::GlobalAlloc>::alloc
   3.41%  [kernel]                 [k] _copy_from_user
   3.20%  [kernel]                 [k] __kmem_cache_alloc_node
   2.59%  [kernel]                 [k] __sys_bpf
   2.30%  [kernel]                 [k] __kmem_cache_free
   1.48%  libc.so.6                [.] syscall
   1.45%  [kernel]                 [k] __virt_addr_valid
   1.42%  scx_rustland             [.] <scx_rustland::bpf::alloc::RustLandAllocator as core::alloc::global::GlobalAlloc>::dealloc
   1.31%  [kernel]                 [k] _copy_to_user
   1.23%  [kernel]                 [k] entry_SYSRETQ_unsafe_stack

However, there's no need to reinvent the wheel here, rather than relying
on an overly simplistic and inefficient allocator, we can rely on
buddy-alloc [1], which is also capable of operating on a preallocated
memory buffer.

After switching to buddy-alloc, the performance profile under the same
workload conditions looks like the following:

   6.01%  [kernel]                 [k] _copy_from_user
   5.21%  [kernel]                 [k] __kmem_cache_alloc_node
   4.45%  [kernel]                 [k] __sys_bpf
   3.80%  [kernel]                 [k] __kmem_cache_free
   2.79%  libc.so.6                [.] syscall
   2.34%  [kernel]                 [k] __virt_addr_valid
   2.26%  [kernel]                 [k] _copy_to_user
   2.14%  [kernel]                 [k] __check_heap_object
   2.10%  [kernel]                 [k] __check_object_size.part.0
   2.02%  [kernel]                 [k] entry_SYSRETQ_unsafe_stack

With this change in place, the primary overhead is now moved to the
bpf() syscall and the copies between kernel and user-space (this could
potentially be optimized in the future using BPF ring buffers, instead
of BPF FIFO queues).

A better focus at the allocator overhead before vs after this change:

 [before]
 39.95%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::alloc
  1.42%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::dealloc

 [after]
  1.50%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::alloc
  0.76%  scx_rustland  [.] core::alloc::global::GlobalAlloc>::dealloc

[1] https://crates.io/crates/buddy-alloc

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-11 14:33:39 +01:00
..
c scx: Add compat support for SCX_KICK_IDLE and use it for idle CPU wakeups 2024-02-06 15:28:40 -10:00
include scx_rusty: Fix typos 2024-02-07 23:38:26 -06:00
rust scx_rustland: replace custom allocator with buddy-alloc 2024-02-11 14:33:39 +01:00
meson.build Restructure scheds folder names 2023-12-17 13:14:31 -08:00
README.md Restructure scheds folder names 2023-12-17 13:14:31 -08:00
sync-to-kernel.sh scheds/sync-to-kernel.sh: Drop most schedulers from sync 2024-02-02 09:08:30 -10:00

SCHED_EXT SCHEDULERS

Introduction

This directory contains the repo's schedulers.

Some of these schedulers are simply examples of different types of schedulers that can be built using sched_ext. They can be loaded and used to schedule on your system, but their primary purpose is to illustrate how various features of sched_ext can be used.

Other schedulers are actually performant, production-ready schedulers. That is, for the correct workload and with the correct tuning, they may be deployed in a production environment with acceptable or possibly even improved performance. Some of the examples could be improved to become production schedulers.

Please see the following README files for details on each of the various types of schedulers:

  • rust describes all of the schedulers with rust user space components. All of these schedulers are production ready.
  • c describes all of the schedulers with C user space components. All of these schedulers are production ready.

Note on syncing

Note that there is a sync-to-kernel.sh script in this directory. This is used to sync any changes to the specific schedulers with the Linux kernel tree. If you've made any changes to a scheduler in please use the script to synchronize with the sched_ext Linux kernel tree:

$ ./sync-to-kernel.sh /path/to/kernel/tree