Pass enqueue flags to user-space: flags will be passed via
QueuedTask.flags and can be forwarded back to BPF via
DispatchedTask.flags.
These flags can be also passed to BpfScheduler.select_cpu() to apply a
more refined CPU selection policy.
Moreover, avoid to prioritize the user-space scheduler too much and
dispatch it only if there are no other tasks that needs to be dispatched
in ops.dispatch().
This improves CPU utilization and enhances the fairness, robustness, and
resilience of schedulers based on scx_rustland_core, particularly under
stress test conditions.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Many kernel threads performs latency critical tasks (e.g., net, gpu). In
particular, AMD GPU driver runs the most part in the kernel space using
kworker. Hence, treat kernel threads as if a woken up task.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Initialize the node cpumask, which was previously uninitialized causing
metric calculations to be wrong when attempting to lookup CPUs in the
node cpumask.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it.
Test plan:
- CI lint check passes.
The domains are added to the aggregator when load is added (and
duty_cycle is not 0.0f64).
This commit makes sure that all domains are added to the aggregator even
when the calculated duty_cycle is 0.
Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>
Pass in the layer spec when determining the layer core growth algo. This
should make it easier to implement layer growth algos that are spec
specific.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Using p->scx.slice to evaluate the consumed time slice can be a bit
imprecise, because the sched_ext core implements yielding by setting
p->scx.slice to 0.
When the task's vruntime is evaluated this is considered as the task has
exhausted its entire allocated time slice, even though it voluntarily
released the CPU before the slice fully expired.
To avoid this inaccuracy and prevent penalizing tasks that voluntarily
release the CPU, always evaluate the used time slice based on the
difference in the task's total execution time (p->se.sum_exec_runtime).
This method provides a more precise calculation of vruntime and results
in a fairer task's deadline evaluation.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Rust build was using two separate workspaces - rust/ and scheds/rust.
There's no reason to separate them and it makes doc generation tricky. Use
single top level workspace so that we can drive all rust building from
cargo.
split build and test jobs to reduce ci turnaround time
and make it clear what is failing when something fails.
also add virtiofsd to deps to make test compilation faster
(most test time is compliation) and remove all force 9ps.
Simplify scx_rlfifo code, add detailed documentation of the
scx_rustland_core API and get rid of the additional task queue, since it
just makes the code bigger, slower and it doesn't really provide any
benefit (considering that we are dispatching the tasks in FIFO order
anyway).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Pass the enqueue flags to the user-space scheduler through the
QueuedTask struct.
These flags allow the user-space scheduler to make more informed
scheduling decisions.
Also bump up scx_rustland_core minor version to reflect the new API (we
are not breaking the old API, so we don't need to bump the major version
in this case).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Unexpectedly, little cores, which have relative short time slices, have
more chance to schedule performance-critical tasks. Hence it is better
to keep the time slice same regardless the core types.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When selecting an idle CPU for a task that has been woken up, prioritize
reusing the same CPU if the waker and wakee share the same L3 cache.
Otherwise, attempt to migrate the wakee to the waker's CPU, provided it
is allowed by the wakee's scheduling domain.
This seems to consistently improve FPS performance when the system is
not operating over its full capacity.
Example:
$ __GL_SYNC_TO_VBLANK=0 vblank_mode=0 glxgears -geometry 800x600
- before: ~18305.77 FPS
- after: ~19060.62 FPS
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Rename "turbo domain" to "preferred domain", that conceptually is more
generic and introduce the new option `--preferred-domain CPUMASK`, which
allows users to define the preferred domain, specifying a cpumask as a
hex number. By default ("auto") the scheduler will always try to detect
and use the fastest CPUs in the system.
Moreover, adjust the cpufreq logic to use "auto" both with the
"balance_power" and "balance_performance" EPP profiles.
Then, enable "auto" mode by default: the scheduler will try to
automatically determine the optimal primary domain, preferred domain and
cpufreq level, based on the selected scheduler and energy profiles.
Tested-by: Piotr Gorski < piotr.gorski@cachyos.org >
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Fix formatting precision of stats to have lower precision for
readability. The existing formatting is hard to read:
tot= 1538 local=31.27 open_idle= 2.73 affn_viol=23.80 proc=4ms
busy= 1.1 util= 16.6 load= 32.7 fallback_cpu= 6
excl_coll=0.06501950585175553 excl_preempt=0.26007802340702213 excl_idle=0.16384915474642392 excl_wakeup=0.25097529258777634
With this fix stats are far more readable formatting:
tot= 441 local=33.56 open_idle= 0.00 affn_viol=20.63 proc=3ms
busy= 0.4 util= 6.3 load= 33.6 fallback_cpu= 6
excl_coll=0.454 excl_preempt=0.000 excl_idle=0.132 excl_wakeup=0.200
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
When a pinned task cannot run on either active or overflow sets, we try
to stay on the previous CPU which is still okay to run on.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Bump up scx_rustland_core version to include this critical fix that
allows to prevent scheduler stalls:
94a3594 ("scx_rustland_core: always dispatch per-cpu kthreads directly")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>