`#stat_doc` extends the document from stat desc property.
Add this attribute macro to the remaining Stats structs.
Signed-off-by: Ming Yang <minos.future@gmail.com>
task_avg_nvcsw() was incorrectly returning a bool instead of u64,
limiting the impact of the lowlatency boost.
Fix it by returning the proper type (u64).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
When a task is the last one running on a CPU and still wants to
continue, allow it to run and replenish its time only if the used CPU is
part a fully idle SMT core.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
During ttwu, the kernel may decide to skip ->select_task_rq() (e.g.,
when only one CPU is allowed or migration is disabled). This causes to
call ops.enqueue() directly without having a chance to call
ops.select_cpu().
Therefore, introduce a new flag (select_cpu_done) in the local task
context to determine if ops.select_cpu() was bypassed and, in that case,
attempt to find an idle CPU directly from ops.enqueue().
In the future this information will be supplied by the kernel through a
special enqueue flag (SCX_ENQ_CPU_SELECTED) [1]. However, the custom
flag in the local task context ensures to reliably determine the same
information, even on older kernels where this flag is not available.
[1] https://lore.kernel.org/lkml/20240928003840.GA2717@maniforge/T
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Fix a bug in cache initialization where the first node would repeated
get all CPUs added to the mask. Refactor some consts to be more clear.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
When finding a victim candidate for preemption, a randomly chosen
candidate could be out of valid CPU range due to CPU offline, etc. In
this case, try another CPU randomly.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The doc of scx_layered `Opt` is out of sync.
Implement attribute macro #stat_doc to generate doc from the `desc`
property.
Apply #stat_doc to `LayerStats` and `SysStats in scx_layered.
Signed-off-by : Ming Yang <minos.future@gmail.com>
We used the average performance criticality of tasks as a threshold to
determine the proper core type (big or little). However, if the big
core's compute capacity is not half of the total compute capacity, such
an average-based determination becomes suboptimal. If fewer tasks are
classified as performance-critical tasks and requested to run on big
cores, the big cores would be wasted by stealing arbitrary
non-performance-critical tasks. That could result in performance
instability.
Hence, determine the threshold more accurately by considering (active)
big cores' compute capacity and the (approximated) distribution of
performance criticality of tasks.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
As a preparation to improve the performance criticality logic, we first
rename "avg_perf_cri" to "thr_perf_cri" since average is no longer the
threshold.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Add an enum for the layer growth algo to the bpf layer config. This will
be useful for implementing topology aware layer growth algorithms.
When selecting an idle CPU the current logic tries to keep tasks
local to LLC/NUMA node. However, for certain growth algorithms (ex:
RoundRobin) this is suboptimal. Adding the layer growth algorithm
will allow for different paths for CPU selection in the idle/preemption
paths.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
* enable ide's etc. to work on the bpf.c files
this makes it so that clangd and ide tools which use clangd
can work on the bpf.c code.
nothing should actually be changed outside of that ide/editor
environment, all the changes are ifdef'ed on LSP which is set
in the added .clangd file.
* move intf include out of both sides of ifdef toggle
When preempting restrict preemption to the current layer cpumask. This
may reduce the amount of preemption, but cause better cache locality
of preempted tasks.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Remove cast_mask() function distributed throughout different schedulers
and add it in common.bpf.h so every scheduler can reference it once they
need to.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
If a waker is more latency critical than a wakee, inherit a waker's
latency criticality for the wakee. This allows the wakee to consider the
context of who wakes me up. For now, we limit such inheritance to one
hop and one schedule.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Use the cast_mask helper to clean up some of the bpf cpumask conversion
code for preemption.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Add topology aware preemption that begins in the local LLC and attempts
to preempt from cpus nearest in the topology.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Previously, we found a victim from the entire CPUs, which include remote
or non-compatible CPUs. Now we limit our search for victim finding
within a task's compute domain.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Add core growth algos for Big/Little core support. The algos allow
layers to grow layers by preferring either big or little cores first.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
The usage of cast_mask() within bpfland_enqueue aims to cast the type of
"p->cpus_ptr" from "struct bpf_cpumask *" to "const struct cpumask *".
However, the type of "p->cpus_ptr" is already "const cpumask_t *" aka
"const struct cpumask *", so no conversion is needed.
Passing a value of type "struct cpumask *" into "struct bpf_cpumask *"
also leads to compiling error.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>