scx-upstream/scheds/rust/scx_rustland
Andrea Righi 6f4cd853f9 scx_rustland: introduce virtual time slice
Overview
========

Currently, a task's time slice is determined based on the total number
of tasks waiting to be scheduled: the more overloaded the system, the
shorter the time slice.

This approach can help to reduce the average wait time of all tasks,
allowing them to progress more slowly, but uniformly, thus providing a
smoother overall system performance.

However, under heavy system load, this approach can lead to very short
time slices distributed among all tasks, causing excessive context
switches that can badly affect soft real-time workloads.

Moreover, the scheduler tends to operate in a bursty manner (tasks are
queued and dispatched in bursts). This can also result in fluctuations
of longer and shorter time slices, depending on the number of tasks
still waiting in the scheduler's queue.

Such behavior can also negatively impact on soft real-time workloads,
such as real-time audio processing.

Virtual time slice
==================

To mitigate this problem, introduce the concept of virtual time slice:
the idea is to evaluate the optimal time slice of a task, considering
the vruntime as a deadline for the task to complete its work before
releasing the CPU.

This is accomplished by calculating the difference between the task's
vruntime and the global current vruntime and use this value as the task
time slice:

  task_slice = task_vruntime - min_vruntime

In this way, tasks that "promise" to release the CPU quickly (based on
their previous work pattern) get a much higher priority (due to
vruntime-based scheduling and the additional priority boost for being
classified as interactive), but they are also given a shorter time slice
to complete their work and fulfill their promise of rapidity.

At the same time tasks that are more CPU-intensive get de-prioritized,
but they will tend to have a longer time slice available, reducing in
this way the amount of context switches that can negatively affect their
performance.

In conclusion, latency-sensitive tasks get a high priority and a short
time slice (and they can preempt other tasks), CPU-intensive tasks get
low priority and a long time slice.

Example
=======

Let's consider the following theoretical scenario:

 task | time
 -----+-----
   A  | 1
   B  | 3
   C  | 6
   D  | 6

In this case task A represents a short interactive task, task C and D
are CPU-intensive tasks and task B is mainly interactive, but it also
requires some CPU time.

With a uniform time slice, scaled based on the amount of tasks, the
scheduling looks like this (assuming the time slice is 2):

 A B B C C D D A B C C D D C C D D
  |   |   |   | | |   |   |   |
  `---`---`---`-`-`---`---`---`----> 9 context switches

With the virtual time slice the scheduling changes to this:

 A B B C C C D A B C C C D D D D D
  |   |     | | | |     |
  `---`-----`-`-`-`-----`----------> 7 context switches

In the latter scenario, tasks do not receive the same time slice scaled
by the total number of tasks waiting to be scheduled. Instead, their
time slice is adjusted based on their previous CPU usage. Tasks that
used more CPU time are given longer slices and their processing time
tends to be packed together, reducing the amount of context switches.

Meanwhile, latency-sensitive tasks can still be processed as soon as
they need to, because they get a higher priority and they can preempt
other tasks. However, they will get a short time slice, so tasks that
were incorrectly classified as interactive will still be forced to
release the CPU quickly.

Experimental results
====================

This patch has been tested on a on a 8-cores AMD Ryzen 7 5800X 8-Core
Processor (16 threads with SMT), 16GB RAM, NVIDIA GeForce RTX 3070.

The test case involves the usual benchmark of playing a video game while
simultaneously overloading the system with a parallel kernel build
(`make -j32`).

The average frames per second (fps) reported by Steam is used as a
metric for measuring system responsiveness (the higher the better):

 Game                       |  before |  after  | delta  |
 ---------------------------+---------+---------+--------+
 Baldur's Gate 3            |  40 fps |  48 fps | +20.0% |
 Counter-Strike 2           |   8 fps |  15 fps | +87.5% |
 Cyberpunk 2077             |  41 fps |  46 fps | +12.2% |
 Terraria                   |  98 fps | 108 fps | +10.2% |
 Team Fortress 2            |  81 fps |  92 fps | +13.6% |
 WebGL demo (firefox) [1]   |  32 fps |  42 fps | +31.2% |
 ---------------------------+---------+---------+--------+

Apart from the massive boost with Counter-Strike 2 (that should be taken
with a grain of salt, considering the overall poor performance in both
cases), the virtual time slice seems to systematically provide a boost
in responsiveness of around +10-20% fps.

It also seems to significantly prevent potential audio cracking issues
when the system is massively overloaded: no audio cracking was detected
during the entire run of these tests with the virtual deadline change
applied.

[1] https://webglsamples.org/aquarium/aquarium.html

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-04 23:01:13 +02:00
..
src scx_rustland: introduce virtual time slice 2024-06-04 23:01:13 +02:00
.gitignore scx_rustland_core: generate source files in-tree 2024-02-28 17:49:44 +01:00
build.rs scx_rustland_core: introduce RustLandBuilder() 2024-02-28 17:49:44 +01:00
Cargo.toml Bump versions for a release 2024-06-03 08:35:21 -10:00
LICENSE scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00
meson.build scheds-rust: build rust schedulers in sequence 2024-04-23 08:06:27 +08:00
README.md scx_rustland: update documentation 2024-02-28 17:49:44 +01:00
rustfmt.toml scx_rustland: rename from scx_rustlite 2023-12-22 00:20:14 +01:00

scx_rustland

This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.

Overview

scx_rustland is made of a BPF component (scx_rustland_core) that implements the low level sched-ext functionalities and a user-space counterpart (scheduler), written in Rust, that implements the actual scheduling policy.

How To Install

Available as a Rust crate: cargo add scx_rustland

Typical Use Case

scx_rustland is designed to prioritize interactive workloads over background CPU-intensive workloads. For this reason the typical use case of this scheduler involves low-latency interactive applications, such as gaming, video conferencing and live streaming.

scx_rustland is also designed to be an "easy to read" template that can be used by any developer to quickly experiment more complex scheduling policies fully implemented in Rust.

Production Ready?

Not quite. For production scenarios, other schedulers are likely to exhibit better performance, as offloading all scheduling decisions to user-space comes with a certain cost.

However, a scheduler entirely implemented in user-space holds the potential for seamless integration with sophisticated libraries, tracing tools, external services (e.g., AI), etc.

Hence, there might be situations where the benefits outweigh the overhead, justifying the use of this scheduler in a production environment.

Demo

scx_rustland-terraria

The key takeaway of this demo is to demonstrate that , despite the overhead of running a scheduler in user-space, we can still obtain interesting results and, in this particular case, even outperform the default Linux scheduler (EEVDF) in terms of application responsiveness (fps), while a CPU intensive workload (parallel kernel build) is running in the background.