f9a994412d
Allow to specify a primary scheduling domain via the new command line option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of arbitrary length, representing the CPUs assigned to the domain. If this option is not specified the scheduler will use all the available CPUs in the system as primary domain (no behavior change). Otherwise, if a primary scheduling domain is defined, the scheduler will try to dispatch tasks only to the CPUs assigned to the primary domain, until these CPUs are saturated, at which point tasks may overflow to other available CPUs. This feature can be used to prioritize certain cores over others and it can be really effective in systems with heterogeneous cores (e.g., hybrid systems with P-cores and E-cores). == Example (hybrid architecture) == Hardware: - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H - 6 P-cores 0..5 with 2 CPUs each (CPU from 0..11) - 8 E-cores 6..13 with 1 CPU each (CPU from 12..19) == Test == WebGL application (https://webglsamples.org/aquarium/aquarium.html): this allows to generate a steady workload in the system without over-saturating the CPUs. Use different scheduler configurations: - EEVDF (default) - scx_bpfland using P-cores only (--primary-domain 0x00fff) - scx_bpfland using E-cores only (--primary-domain 0xff000) Measure performance (fps) and power consumption (W). == Result == +-----+-----+------+-----+----------+ | min | max | avg | | | | fps | fps | fps | stdev | power | +-----------------+-----+-----+------+-------+--------+ | EEVDF | 28 | 34 | 31.0 | 1.73 | 3.5W | | bpfland-p-cores | 33 | 34 | 33.5 | 0.29 | 3.5W | | bpfland-e-cores | 25 | 26 | 25.5 | 0.29 | 2.2W | +-----------------+-----+-----+------+-------+--------+ Using a primary scheduling domain of only P-cores with scx_bpfland allows to achieve a more stable and predictable level of performance, with an average of 33.5 fps and an error of ±0.5 fps. In contrast, using EEVDF results in an average frame rate of 31.0 fps with an error of ±3.0 fps, indicating slightly less consistency, due to the fact that tasks are evenly distributed across all the cores in the system (both slow and fast cores). On the other hand, using a scheduling domain solely of E-cores with scx_bpfland results in a lower average frame rate (25.5 fps), though it maintains a stable performance (error of ±0.5 fps), but the power consumption is also reduced, averaging 2.2W, compared to 3.5W with either of the other configurations. == Conclusion == In summary, with this change users have the flexibility to prioritize scheduling on performance cores for better performance and consistency, or prioritize energy efficient cores for reduced power consumption, on hybrid architectures. Moreover, this feature can also be used to minimize the number of cores used by the scheduler, until they reach full capacity. This capability can be useful for reducing power consumption even in homogeneous systems or for conducting scheduling experiments with smaller sets of cores, provided the system is not overcommitted. Signed-off-by: Andrea Righi <andrea.righi@linux.dev> |
||
---|---|---|
.. | ||
src | ||
build.rs | ||
Cargo.toml | ||
LICENSE | ||
meson.build | ||
README.md | ||
rustfmt.toml |
scx_bpfland
This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.
Overview
scx_bpfland: a vruntime-based sched_ext scheduler that prioritizes interactive workloads.
This scheduler is derived from scx_rustland, but it is fully implemented in BPF with minimal user-space Rust part to process command line options, collect metrics and logs out scheduling statistics. The BPF part makes all the scheduling decisions.
Tasks are categorized as either interactive or regular based on their average rate of voluntary context switches per second. Tasks that exceed a specific voluntary context switch threshold are classified as interactive. Interactive tasks are prioritized in a higher-priority queue, while regular tasks are placed in a lower-priority queue. Within each queue, tasks are sorted based on their weighted runtime: tasks that have higher weight (priority) or use the CPU for less time (smaller runtime) are scheduled sooner, due to their a higher position in the queue.
Moreover, each task gets a time slice budget. When a task is dispatched, it receives a time slice equivalent to the remaining unused portion of its previously allocated time slice (with a minimum threshold applied). This gives latency-sensitive workloads more chances to exceed their time slice when needed to perform short bursts of CPU activity without being interrupted (i.e., real-time audio encoding / decoding workloads).
Typical Use Case
Interactive workloads, such as gaming, live streaming, multimedia, real-time audio encoding/decoding, especially when these workloads are running alongside CPU-intensive background tasks.
In this scenario scx_bpfland ensures that interactive workloads maintain a high level of responsiveness.
Production Ready?
The scheduler is based on scx_rustland, implementing nearly the same scheduling algorithm with minor changes and optimizations to be fully implemented in BPF.
Given that the scx_rustland scheduling algorithm has been extensively tested, this scheduler can be considered ready for production use.