We don't need to send the number of voluntary context switches (nvcsw)
from BPF to user-space, as this information is already accessible in
user-space via procfs. Sending this data would only create unnecessary
overhead for schedulers that don't require it, and those that do can
easily retrieve it through procfs.
Therefore, drop this metric from scx_rustland_core and change
scx_rustland implementing an interactive task classifier fully in the
user-space part of the scheduler.
Also drop some options that are not provide any significant benefit
(also in preparation of a bigger refactoring to define a better API for
the user-space framework).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Explicitly import timespec to fix the following potential build error:
error[E0422]: cannot find struct, variant or union type `timespec` in this scope
--> src/bpf.rs:365:35
|
365 | sched_ss_repl_period: timespec {
| ^^^^^^^^ not found in this scope
|
help: consider importing this struct
|
6 + use libc::timespec;
|
This fixes issue #469.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
- Use .enumerate() consistently while building the cpu_fids vector.
- Use .then_with() to chain .cmp() when sorting cpu_fids.
Both reduce visual clutter.
Add a parameter to disable topology awareness. This is useful when
trying to compare the scheduling performance of topology aware
scheduling compared to the previous scheduling strategy.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
With optimizations of calculatring ineligibility duration, now the
scheduler works well under heavy load without 2-level scheduling, so we
drop it for simplicitiy.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
This commit include a few changes:
- treat a new forked task more conservatively
- defer the execution of more tasks for longer time using ineligibility duration
- consider if a task is waken up in calculating ineligibility duration
Immediately re-align p->scx.dsq_vtime to the global vruntime (+/- slice
lag) as soon as we are evaluating the task's vruntime.
This allows rapidly chase the minimum global vruntime, ensuring to not
over prioritize tasks tasks with a predominantly sleeping behavior
pattern.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
When the previous CPU for a task is not known do not fall back to
dispatching to CPU 0, use the current CPU.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
L or R: Latency-critical, Regular
H or I: performance-Hungry, performance-Insensitive
B or T: Big, liTtle
E or G: Eligible, Greedy
P or N: Preemption, Not
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Add a per cpu counter offset to round robin when iterating on layers.
This is to make selection from different layers more fair.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Tuning the time slice under high load and change the kick/tick margins
for preemption more conservative. Especially, aggressive IPI-based
preemption (kick) causes performance unstability.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Instead of using coarse-grained log(), let's directly use the ratio of
task's service time. Also, the virtual dealine equation is also updated
to reflect this change.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The max_entries parameter in BPF_MAP_TYPE_PERCPU_ARRAY defines the
number of values per CPU and for cpu_ctx_stor we only need one item: the
CPU context.
Set max_entries to 1 to avoid allocating unnecessary memory and slightly
reduce the memory footprint.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
We introduce two-level scheduling similar to scx_bpfland. The two-level
scheduling consists of two DSQs: 1) latency-critical run queue and 2)
regular run queue. The scheduler prioritizes scheduling tasks on the
latency-critical queue but makes its best effort to schedule tasks on
the regular queue. The scheduler could be more resilient under heavy
load by segregating regular, non-latency-critical tasks from
latency-critical tasks.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The max frequency information from topology (from sysfs) seems not
always true. In some installations, it returns zero for all CPUs. In
this case, let's just consider all CPUs have the same capacity (1024),
hoping the kernel can give more preceise information.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Latency criticality is a task's inherent property, but the starvation
factor is its dynamic status for the urgency of scheduling. Hence, we
segregate the starvation factor out. Also, cleaned up unnecessary
arguments and struct fields related.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When a task is running on more performant core, the scheduler will give
a longer time slice. On the other hand, on a less performant core, a
shorter time slice will be assigned. The longer time slice helps
boosting clock frequency on a performant core. Also, the shorter time
slice gives more chance the performant core being utilized.
Regarding the CPU capacity, we first check if kernel-provided capacitiy values
are trustworthy or not. If not (i.e., all the same values), we rely on
the user-provided value, based on each CPU's maximum clock frequency.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
With the --prefer-smt-core option is on, the core compaction prefers to
utilizae hyper-twin first before utilizing the other physical CPUs. By
default, the option is off.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Previously, the core compaction assumed that each core's capacity was
the same. Now, we additionally consider each core's max clock frequency.
So, it always tries to use the higher-frequency cores first.
Signed-off-by: Changwoo Min <changwoo@igalia.com>