Introduce the new `--lowlatency` option, which enables switching between
the default pure vruntime-based scheduling (more optimized for server
workloads) and a deadline-based scheduling (better suited for
low-latency workloads).
When the low-latency mode is activated, a task's deadline is calculated
as its vruntime, adjusted by a bonus proportional to the task's average
number of voluntary context switches (the more voluntary context
switches, the shorter the deadline).
This feature enhances the prioritization of interactive tasks even more,
proportionally to their average voluntary context switches, also within
the two main global queues (priority / shared) and it helps to maintain
interactive workloads always responsive, even in presence of heavy
non-interactive background work.
Low-latency mode allows to prevent audio cracking even in presence of a
large amount of short-lived tasks with pseudo-interactive behavior (i.e,
hackbench) and it enables achieving approximately a +33% average
frames-per-second (FPS) in the typical "gaming while building the
kernel" benchmark.
However, it can also amplify the de-prioritization of CPU-intensive
tasks, making this option more suitable for specific low-latency
scenarios. Therefore the low-latency mode is disabled by default and it
can only be enabled via the `--lowlatency` option.
Tested-by: Piotr Gorski (piotrgorski@cachyos.org)
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Explicitly replenish the task's time slice from ops.dispatch() if the
task still wants to run and no other task is selected. In this way the
sched_ext core won't automatically re-schedule the task on the same CPU,
implicitly assigning a time slice of SCX_SLICE_DFL.
Moreover, instead of determining the task time slice in ops.enqueue(),
refresh the time slice immediately before the task is started on its
assigned CPU in ops.running().
This allows to use a more precise time slice, adjusted based on the
actual amount of tasks that are currently waiting to be scheduled.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The meaning of SCX_OPS_ENQ_LAST will change with future kernel updates and
enqueueing on local DSQ will no longer be sufficient to avoid stalls. No
reason to do it anyway. Just drop it.
With the global scx_utils::NR_CPU_IDS we don't need Topology anymore in
init_primary_domain(), so drop the variable to fix the following build
warning:
warning: unused variable: `topo`
--> src/main.rs:385:9
|
385 | topo: &Topology,
| ^^^^ help: if this is intentional, prefix it with an underscore: `_topo`
|
= note: `#[warn(unused_variables)]` on by default
Fixes: 1da249f ("scx_utils::topology: Always use NR_CPU_IDS and NR_CPUS_POSSIBLE")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Use the base frequency, instead of maximum frequency, to classify fast
and slow CPUs. This ensures accurate distinction between Intel Turbo
Boost CPUs and genuinely faster CPUs when auto-detecting the primary
scheduling domain.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
- Update scx_utils/build.rs so that 12 char SHA1 is generated instead of
full one.
- Add --version to scx_rusty. Use custom one as we don't want to use the
default cargo version one.
Tasks enqueued with SCX_ENQ_WAKEUP are immediately classified as
interactive. However, if interactive tasks classification is disabled
(via `-c 0`), we should avoid promoting them as interactive.
This is particularly important because, with the nvcsw logic disabled,
tasks can remain classified as interactive indefinitely and they will
never be demoted to regular tasks.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Rely on scx_utils::Cpumask instead of re-implementing a custom struct to
parse and manage CPU masks.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Fix a bug introduced in #510 where it assumed core ids are incremental.
This refactors the core ordering for layers to be far more simple and
provide some space for layer core isolation in low utilization.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Rely on scx_utils::Topology to get CPU and cache information, instead of
re-implementing custom methods.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Currently the core selection logic in scx_layered uses the first
available core in the bitmask. This is suboptimal when the scheduler is
configured with specific NUMA/LLC restrictions. The ideal core selection
logic should try to find the least used cores within the preferred
scheduling domain and allocate new cpus from shared cores within that
domain.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
- If --monitor is specified with layer specs, the scheduler also starts
stats monitoring on a thread.
- Standalone monitoring mode no longer exits when the scheduler isn't there.
Instead of keeping one copy of sched_stats, each stats server session
carries their own so that stats can be generated independently by each
client at any interval. CPU allocation min/max tracking is broken for now.