Re-add the partial mode option that was dropped during the refactoring.
The partial option allows to apply the scheduler only to the tasks which
have their scheduling policy set to SCHED_EXT via sched_setscheduler().
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
The API for determining which PID is running on a specific CPU is racy
and is unnecessary since this information can be obtained from user
space.
Additionally, it's not reliable for identifying idle CPUs. Therefore,
it's better to remove this API and, in the future, provide a cpumask
alternative that can export the idle state of the CPUs to user space.
As a consequence also change scx_rustland to dispatch one task a time,
instead of dispatching tasks in batches of idle cores (that are usually
not accurate due to the racy nature of the CPU ownership interaface).
Dispatching one task at a time even makes the scheduler more performant,
due to the vruntime scheduling being applied to more tasks sitting in
the scheduler's queue.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Drop the slice boost logic and apply a vruntime and task time slice
evaluation approach similar to scx_bpfland (but implement this in the
user-space component instead of the BPF part).
Additionally, introduce a slice_us_min parameter to define the minimum
time slice that can be assigned to a task, also similar to scx_bpfland.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Use the same idle selection logic used in scx_bpfland also in
scx_rustland_core.
Also drop fifo_mode and always use the BPF idle selection logic by
default as long as the system is not saturated, unless full_user is
specified.
This approach allows user-space schedulers aiming for maximum
performance to leverage the BPF idle selection logic (bypassing
user-space), while those seeking full control can enable full_user to
bypass the BPF CPU idle selection logic and choose the target CPU for
each task from user-space.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Fix args to bpf_builder, existing warnings display for
"-Wno-compare-distinct-pointer-types'".
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
We don't need to send the number of voluntary context switches (nvcsw)
from BPF to user-space, as this information is already accessible in
user-space via procfs. Sending this data would only create unnecessary
overhead for schedulers that don't require it, and those that do can
easily retrieve it through procfs.
Therefore, drop this metric from scx_rustland_core and change
scx_rustland implementing an interactive task classifier fully in the
user-space part of the scheduler.
Also drop some options that are not provide any significant benefit
(also in preparation of a bigger refactoring to define a better API for
the user-space framework).
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Explicitly import timespec to fix the following potential build error:
error[E0422]: cannot find struct, variant or union type `timespec` in this scope
--> src/bpf.rs:365:35
|
365 | sched_ss_repl_period: timespec {
| ^^^^^^^^ not found in this scope
|
help: consider importing this struct
|
6 + use libc::timespec;
|
This fixes issue #469.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
- Use .enumerate() consistently while building the cpu_fids vector.
- Use .then_with() to chain .cmp() when sorting cpu_fids.
Both reduce visual clutter.
Add a parameter to disable topology awareness. This is useful when
trying to compare the scheduling performance of topology aware
scheduling compared to the previous scheduling strategy.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
With optimizations of calculatring ineligibility duration, now the
scheduler works well under heavy load without 2-level scheduling, so we
drop it for simplicitiy.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
This commit include a few changes:
- treat a new forked task more conservatively
- defer the execution of more tasks for longer time using ineligibility duration
- consider if a task is waken up in calculating ineligibility duration
Immediately re-align p->scx.dsq_vtime to the global vruntime (+/- slice
lag) as soon as we are evaluating the task's vruntime.
This allows rapidly chase the minimum global vruntime, ensuring to not
over prioritize tasks tasks with a predominantly sleeping behavior
pattern.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
When the previous CPU for a task is not known do not fall back to
dispatching to CPU 0, use the current CPU.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
L or R: Latency-critical, Regular
H or I: performance-Hungry, performance-Insensitive
B or T: Big, liTtle
E or G: Eligible, Greedy
P or N: Preemption, Not
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Add a per cpu counter offset to round robin when iterating on layers.
This is to make selection from different layers more fair.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Tuning the time slice under high load and change the kick/tick margins
for preemption more conservative. Especially, aggressive IPI-based
preemption (kick) causes performance unstability.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Instead of using coarse-grained log(), let's directly use the ratio of
task's service time. Also, the virtual dealine equation is also updated
to reflect this change.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The max_entries parameter in BPF_MAP_TYPE_PERCPU_ARRAY defines the
number of values per CPU and for cpu_ctx_stor we only need one item: the
CPU context.
Set max_entries to 1 to avoid allocating unnecessary memory and slightly
reduce the memory footprint.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>