6f4cd853f9
Overview ======== Currently, a task's time slice is determined based on the total number of tasks waiting to be scheduled: the more overloaded the system, the shorter the time slice. This approach can help to reduce the average wait time of all tasks, allowing them to progress more slowly, but uniformly, thus providing a smoother overall system performance. However, under heavy system load, this approach can lead to very short time slices distributed among all tasks, causing excessive context switches that can badly affect soft real-time workloads. Moreover, the scheduler tends to operate in a bursty manner (tasks are queued and dispatched in bursts). This can also result in fluctuations of longer and shorter time slices, depending on the number of tasks still waiting in the scheduler's queue. Such behavior can also negatively impact on soft real-time workloads, such as real-time audio processing. Virtual time slice ================== To mitigate this problem, introduce the concept of virtual time slice: the idea is to evaluate the optimal time slice of a task, considering the vruntime as a deadline for the task to complete its work before releasing the CPU. This is accomplished by calculating the difference between the task's vruntime and the global current vruntime and use this value as the task time slice: task_slice = task_vruntime - min_vruntime In this way, tasks that "promise" to release the CPU quickly (based on their previous work pattern) get a much higher priority (due to vruntime-based scheduling and the additional priority boost for being classified as interactive), but they are also given a shorter time slice to complete their work and fulfill their promise of rapidity. At the same time tasks that are more CPU-intensive get de-prioritized, but they will tend to have a longer time slice available, reducing in this way the amount of context switches that can negatively affect their performance. In conclusion, latency-sensitive tasks get a high priority and a short time slice (and they can preempt other tasks), CPU-intensive tasks get low priority and a long time slice. Example ======= Let's consider the following theoretical scenario: task | time -----+----- A | 1 B | 3 C | 6 D | 6 In this case task A represents a short interactive task, task C and D are CPU-intensive tasks and task B is mainly interactive, but it also requires some CPU time. With a uniform time slice, scaled based on the amount of tasks, the scheduling looks like this (assuming the time slice is 2): A B B C C D D A B C C D D C C D D | | | | | | | | | `---`---`---`-`-`---`---`---`----> 9 context switches With the virtual time slice the scheduling changes to this: A B B C C C D A B C C C D D D D D | | | | | | | `---`-----`-`-`-`-----`----------> 7 context switches In the latter scenario, tasks do not receive the same time slice scaled by the total number of tasks waiting to be scheduled. Instead, their time slice is adjusted based on their previous CPU usage. Tasks that used more CPU time are given longer slices and their processing time tends to be packed together, reducing the amount of context switches. Meanwhile, latency-sensitive tasks can still be processed as soon as they need to, because they get a higher priority and they can preempt other tasks. However, they will get a short time slice, so tasks that were incorrectly classified as interactive will still be forced to release the CPU quickly. Experimental results ==================== This patch has been tested on a on a 8-cores AMD Ryzen 7 5800X 8-Core Processor (16 threads with SMT), 16GB RAM, NVIDIA GeForce RTX 3070. The test case involves the usual benchmark of playing a video game while simultaneously overloading the system with a parallel kernel build (`make -j32`). The average frames per second (fps) reported by Steam is used as a metric for measuring system responsiveness (the higher the better): Game | before | after | delta | ---------------------------+---------+---------+--------+ Baldur's Gate 3 | 40 fps | 48 fps | +20.0% | Counter-Strike 2 | 8 fps | 15 fps | +87.5% | Cyberpunk 2077 | 41 fps | 46 fps | +12.2% | Terraria | 98 fps | 108 fps | +10.2% | Team Fortress 2 | 81 fps | 92 fps | +13.6% | WebGL demo (firefox) [1] | 32 fps | 42 fps | +31.2% | ---------------------------+---------+---------+--------+ Apart from the massive boost with Counter-Strike 2 (that should be taken with a grain of salt, considering the overall poor performance in both cases), the virtual time slice seems to systematically provide a boost in responsiveness of around +10-20% fps. It also seems to significantly prevent potential audio cracking issues when the system is massively overloaded: no audio cracking was detected during the entire run of these tests with the virtual deadline change applied. [1] https://webglsamples.org/aquarium/aquarium.html Signed-off-by: Andrea Righi <andrea.righi@canonical.com> |
||
---|---|---|
.. | ||
src | ||
.gitignore | ||
build.rs | ||
Cargo.toml | ||
LICENSE | ||
meson.build | ||
README.md | ||
rustfmt.toml |
scx_rustland
This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.
Overview
scx_rustland is made of a BPF component (scx_rustland_core) that implements the low level sched-ext functionalities and a user-space counterpart (scheduler), written in Rust, that implements the actual scheduling policy.
How To Install
Available as a Rust crate: cargo add scx_rustland
Typical Use Case
scx_rustland is designed to prioritize interactive workloads over background CPU-intensive workloads. For this reason the typical use case of this scheduler involves low-latency interactive applications, such as gaming, video conferencing and live streaming.
scx_rustland is also designed to be an "easy to read" template that can be used by any developer to quickly experiment more complex scheduling policies fully implemented in Rust.
Production Ready?
Not quite. For production scenarios, other schedulers are likely to exhibit better performance, as offloading all scheduling decisions to user-space comes with a certain cost.
However, a scheduler entirely implemented in user-space holds the potential for seamless integration with sophisticated libraries, tracing tools, external services (e.g., AI), etc.
Hence, there might be situations where the benefits outweigh the overhead, justifying the use of this scheduler in a production environment.
Demo
The key takeaway of this demo is to demonstrate that , despite the overhead of running a scheduler in user-space, we can still obtain interesting results and, in this particular case, even outperform the default Linux scheduler (EEVDF) in terms of application responsiveness (fps), while a CPU intensive workload (parallel kernel build) is running in the background.