The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Provide a knob in scx_rustland_core to automatically turn the scheduler
into a simple FIFO when the system is underutilized.
This choice is based on the assumption that, in the case of system
underutilization (less tasks running than the amount of available CPUs),
the best scheduling policy is FIFO.
With this option enabled the scheduler starts in FIFO mode. If most of
the CPUs are busy (nr_running >= num_cpus - 1), the scheduler
immediately exits from FIFO mode and starts to apply the logic
implemented by the user-space component. Then the scheduler can switch
back to FIFO if there are no tasks waiting to be scheduled (evaluated
using a moving average).
This option can be enabled/disabled by the user-space scheduler using
the fifo_sched parameter in BpfScheduler: if set, the BPF component will
periodically check for system utilization and switch back and forth to
FIFO mode based on that.
This allows to improve performance of workloads that are using a small
amount of the available CPUs in the system, while still maintaining the
same good level of performance for interactive tasks when the system is
over commissioned.
In certain video games, such as Baldur's Gate 3 or Counter-Strike 2,
running in "normal" system conditions, we can experience a boost in fps
of approximately 4-8% with this change applied.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Simplify the CPU idle selection logic relying on the built-in logic.
If something can be improved in this logic it should be done in the
backend, changing the default idle selection logic, rustland doesn't
need to do anything special here for now.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
scx_rustland has a function called get_cpu_owner() in BPF which
currently has no callers. There's nothing wrong with the function, but
it causes a warning due to an unused function. Let's just annotate it
with __maybe_unused to tell the compiler that it's not a problem.
Signed-off-by: David Vernet <void@manifault.com>
Introduce a low-power mode to force the scheduler to operate in a very
non-work conserving way, causing a significant saving in terms of power
consumption, while still providing a good level of responsiveness in the
system.
This option can be enabled in scx_rustland via the --low_power / -l
option.
The idea is to not immediately re-kick a CPU when it enters an idle
state, but do that only if there are no other tasks running in the
system.
In this way, latency-critical tasks can be still dispatched immediately
on the other active CPUs, while CPU-bound tasks will be forced to spend
more time waiting to be scheduled, basically enforcing a special CPU
throttling mechanism that affects only the tasks that are not latency
critical.
The consequence is a reduction in the overall system throughput, but
also a significant reduction of power consumption, that can be useful
for mobile / battery-powered devices.
Test case (using `scx_rustland -l`):
- play a video game (Terraria) while recompiling the kernel
- measure game performance (fps) and core power consumption (W)
- compare the result of normal mode vs low-power mode
Result:
Game performance | Power consumption |
------------+-----------------+-------------------+
normal mode | 60 fps | 6W |
low-power mode | 60 fps | 3W |
As we can see from the result the reduction of power consumption is
quite significant (50%), while the responsiveness of the game (fps)
remains the same, that means battery life can be potentially doubled
without significantly affecting system responsiveness.
The overall throughput of the system is, of course, affected in a
negative way (kernel build is approximately 50% slower during this
test), but the goal here is to save power while still maintaining a good
level of responsiveness in the system.
For this reason the low-power mode should be considered only in
emergency conditions, for example when the system is close to completely
run out of power or simply to extend the battery life of a mobile device
without compromising its responsiveness.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The comment that describes rustland_update_idle() is still incorrectly
reporting an old implemention detail. Update its description for better
clarity.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Change the BPF CPU selection logic as following:
- if the previously used CPU is idle, keep using it
- if the task is not coming from a wait state, try to stick as much as
possible to the same CPU (for better cache usage)
- if the task is waking up from a wait state rely on the sched_ext
built-int idle selection logic
This logic can be completely disabled when the full user-space mode is
enabled. In this case tasks will always be assigned to the previously
used CPU and the user-space scheduler should take care of distributing
them among the available CPUs.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Drop the global effective time-slice and use the more fine-grained
per-task time-slice to implement the dynamic time-slice capability.
This allows to reduce the scheduler's overhead (dropping the global time
slice volatile variable shared between user-space and BPF) and it
provides a more fine-grained control on the per-task time slice.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Replace the BPF_MAP_TYPE_QUEUE with a BPF_MAP_TYPE_USER_RINGBUF to store
the tasks dispatched from the user-space scheduler to the BPF component.
This eliminates the need of the bpf() syscalls, significantly reducing
the overhead of the user-space->kernel communication and delivering a
notable performance boost in the overall system throughput.
Based on experimental results, this change allows to reduces the scheduling
overhead by approximately 30-35% when the system is overcommitted.
This improvement has the potential to make user-space schedulers based
on scx_rustland_core viable options for real production systems.
Link: https://github.com/libbpf/libbpf-rs/pull/776
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Do not encode dispatch flags in the cpu field, but simply use a separate
"flags" field.
This makes the code much simpler and it increases the size of
dispatched_task_ctx from 24 to 32, that is probably better in terms of
cacheline allocation / performance.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce the new dispatch flag RL_PREEMPT_CPU that can be used to
dispatch tasks that can preempt others.
Tasks with this flag set will be dispatched by the BPF part using
SCX_ENQ_PREEMPT, so they can potentially preempt any other task running
on the target CPU.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Reserve some bits of the `cpu` attribute of a task to store special
dispatch flags.
Initially, let's introduce just RL_CPU_ANY to replace the special value
NO_CPU, indicating that the task can be dispatched on any CPU,
specifically the first CPU that becomes available.
This allows to keep the CPU value assigned by the builtin idle selection
logic, that can potentially be used later for further optimizations.
Moreover, having the possibility to specify dispatch flags gives more
flexibility and it allows to map new scheduling features to such flags.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
scx_rustland_core needs to ship both a binary part and a source code
part, which will be used to build schedulers based on it.
To effectively publish the scx_rustland_core crate on crates.io we need
to properly separate the source code assets from the crate's main source
code.
To achieve this, move the assets into a separate directory and declare
them inside a [lib] section in Cargo.toml.
This allows to publish the crate on crates.io, providing also a clear
separation between source code and assets.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>