scx/scheds
Andrea Righi 45d8b54eb9 scx_rustland: re-introduce per-CPU DSQ + a global shared DSQ
With commit c6ada25 ("scx_rustland: use custom pcpu DSQ instead of
SCX_DSQ_LOCAL{_ON}") we tried to introduce custom per-CPU DSQs, instead
of using SCX_DSQ_LOCAL and SCX_DSQ_LOCAL_ON to dispatch tasks.

This was required, because dispatching tasks using SCX_DSQ_LOCAL_ON
doesn't provide a guarantee that the cpumask, checked at dispatch time
to determine the validity of a target CPU, remains valid.

This method solved the cpumask validity issue, but unfortunately it
introduced a noticeable performance regression and a potential
starvation issue (that were probably caused by the same problem): if a
task is assigned to a CPU in select_cpu() and the scheduler decides to
dispatch it on a different CPU, the task will be added to the new CPU's
DSQ, but if no dispatch event happens there, the task may remain stuck
in the per-CPU DSQ for a long time, triggering the sched-ext watchdog
timeout that would kick out the scheduler, for example:

  12:53:28 [WARN] FAIL: IPC:CSteamEngin[7217] failed to run for 6.482s (err=1026)
  12:53:28 [INFO] Unregister RustLand scheduler

Therefore, we reverted this change with 6d89ece ("scx_rustland: dispatch
tasks only on the global DSQ"), dispatching all the tasks to the global
DSQ, completely delegating the kernel to distribute tasks among the
available CPUs.

This is not the ideal solution, because we still want to give the
possibility to the user-space scheduler to assign tasks to specific
CPUs.

Therefore, re-introduce distinct per-CPU DSQs, but also provide a global
shared DSQ. Tasks dispatched in the per-CPU DSQs are consumed from the
dispatch() callback of their corresponding CPU, tasks dispatched in the
global shared DSQ are consumed from any CPU.

In this way the BPF layer is able to provide an interface that gives
the flexibility to the user-space to dispatch a task on a specific CPU
or on the first CPU available, depending on the particular scheduler's
need.

If an invalid CPU (according to the cpumask) is selected the BPF
dispatcher will transparently redirect the task to a valid CPU, selected
using the built-in idle selection logic.

In the future we may want to improve this part, giving to the
user-space the visibility of the cpumask, in order to pick a valid CPU
in advance and in a proper synchronized way.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-01 00:33:35 +01:00
..
c scx_nest: Remove -D option for eager compaction 2024-01-16 14:08:36 -06:00
include scx: Clean up user_exit_info.h 2024-01-31 10:47:24 -10:00
rust scx_rustland: re-introduce per-CPU DSQ + a global shared DSQ 2024-02-01 00:33:35 +01:00
meson.build Restructure scheds folder names 2023-12-17 13:14:31 -08:00
README.md Restructure scheds folder names 2023-12-17 13:14:31 -08:00
sync-to-kernel.sh Restructure scheds folder names 2023-12-17 13:14:31 -08:00

SCHED_EXT SCHEDULERS

Introduction

This directory contains the repo's schedulers.

Some of these schedulers are simply examples of different types of schedulers that can be built using sched_ext. They can be loaded and used to schedule on your system, but their primary purpose is to illustrate how various features of sched_ext can be used.

Other schedulers are actually performant, production-ready schedulers. That is, for the correct workload and with the correct tuning, they may be deployed in a production environment with acceptable or possibly even improved performance. Some of the examples could be improved to become production schedulers.

Please see the following README files for details on each of the various types of schedulers:

  • rust describes all of the schedulers with rust user space components. All of these schedulers are production ready.
  • c describes all of the schedulers with C user space components. All of these schedulers are production ready.

Note on syncing

Note that there is a sync-to-kernel.sh script in this directory. This is used to sync any changes to the specific schedulers with the Linux kernel tree. If you've made any changes to a scheduler in please use the script to synchronize with the sched_ext Linux kernel tree:

$ ./sync-to-kernel.sh /path/to/kernel/tree