Commit Graph

1214 Commits

Author SHA1 Message Date
Andrea Righi
9a0e7755df scx_rustland_core: export counter of online CPUs
Introduce a helper to get the amount of online CPUs tracked by the BPF
part.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
d9c9f78e3e scx_rustland: re-align vruntime and time slice evaluation to scx_bpfland
Drop the slice boost logic and apply a vruntime and task time slice
evaluation approach similar to scx_bpfland (but implement this in the
user-space component instead of the BPF part).

Additionally, introduce a slice_us_min parameter to define the minimum
time slice that can be assigned to a task, also similar to scx_bpfland.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
38a725ea34 scx_rlfifo: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
c963d5eb05 scx_rustland: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
e1e6e31208 scx_rustland_core: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
b87541a26e scx_rustland_core: refactor idle CPU selection logic
Use the same idle selection logic used in scx_bpfland also in
scx_rustland_core.

Also drop fifo_mode and always use the BPF idle selection logic by
default as long as the system is not saturated, unless full_user is
specified.

This approach allows user-space schedulers aiming for maximum
performance to leverage the BPF idle selection logic (bypassing
user-space), while those seeking full control can enable full_user to
bypass the BPF CPU idle selection logic and choose the target CPU for
each task from user-space.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
d8985306f4 scx_rustland: user-space interactive task classifier
We don't need to send the number of voluntary context switches (nvcsw)
from BPF to user-space, as this information is already accessible in
user-space via procfs. Sending this data would only create unnecessary
overhead for schedulers that don't require it, and those that do can
easily retrieve it through procfs.

Therefore, drop this metric from scx_rustland_core and change
scx_rustland implementing an interactive task classifier fully in the
user-space part of the scheduler.

Also drop some options that are not provide any significant benefit
(also in preparation of a bigger refactoring to define a better API for
the user-space framework).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 17:56:58 +02:00
Andrea Righi
a8d14fc0c4
Merge pull request #471 from sched-ext/rustland-core-musl
scx_rustland_core: add support for musl
2024-08-06 17:56:19 +02:00
Andrea Righi
4c67e336e4
Merge pull request #473 from Kawanaao/rustland-fix-mutex-alloc
scx_rustland_core alloc: Replaced RefCell with Mutex
2024-08-06 16:26:05 +02:00
Kawanaao
c3109ebeed
scx_rustland_core alloc: Replaced RefCell with Mutex
Necessary for some multi-threaded cases
2024-08-06 13:10:21 +00:00
Andrea Righi
4c7fb5cdbd scx_rustland_core: add support for musl
It seems that musl glibc is not POSIX.1-2001, POSIX.1-2008 compliant and
using sched_setscheduler() just returns -ENOSYS:
https://git.musl-libc.org/cgit/musl/commit/src/sched/sched_setscheduler.c?id=1e21e78bf7a5c24c217446d8760be7b7188711c2

Switch to pthread_setschedparam() to properly support building
scx_rustland_core with musl.

This fixes #469.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 07:32:38 +02:00
Andrea Righi
493de4f5e8
Merge pull request #470 from sched-ext/rustland-core-import-timespec
scx_rustland_core: fix missing import (timespec)
2024-08-05 23:45:47 +02:00
Andrea Righi
d4005dd186 scx_rustland_core: fix missing import (timespec)
Explicitly import timespec to fix the following potential build error:

error[E0422]: cannot find struct, variant or union type `timespec` in this scope
 --> src/bpf.rs:365:35
    |
365 | sched_ss_repl_period: timespec {
    |                       ^^^^^^^^ not found in this scope
    |
help: consider importing this struct
    |
6 + use libc::timespec;
    |

This fixes issue #469.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-05 23:15:48 +02:00
Changwoo Min
38efa974a8
Merge pull request #467 from sched-ext/htejun/lavd-misc
scx_lavd: Make FlatTopology::new() a bit prettier
2024-08-05 10:43:48 +09:00
Tejun Heo
b226865b96 scx_lavd: Make FlatTopology::new() a bit prettier
- Use .enumerate() consistently while building the cpu_fids vector.

- Use .then_with() to chain .cmp() when sorting cpu_fids.

Both reduce visual clutter.
2024-08-04 11:16:19 -10:00
Changwoo Min
130ea97fbf
Merge pull request #464 from multics69/lavd-amp-v3
scx_lavd: improve the calculation of ineligibility duration
2024-08-03 09:57:41 +09:00
Daniel Hodges
d0355c4a88
Merge pull request #466 from hodgesds/dsq-lat
scripts: Add dsq latency script
2024-08-02 16:45:03 -04:00
Andrea Righi
3ad2875240
Merge pull request #463 from sched-ext/bpfland-update-dsq-vtime
scx_bpfland: always re-align task's vruntime to the global vruntime
2024-08-02 22:13:12 +02:00
Daniel Hodges
0197649af5
Merge pull request #465 from hodgesds/disable-numa-llc
scx_layered: Add support for disabling topology awareness
2024-08-02 15:28:54 -04:00
Daniel Hodges
b6054fb5f2 scripts: Add dsq latency script
This change adds a bpftrace script to monitor runq latency as well as
dsq latency.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-02 12:19:41 -07:00
Daniel Hodges
1f922b9a73 scx_layered: Add support for disabling topology awareness
Add a parameter to disable topology awareness. This is useful when
trying to compare the scheduling performance of topology aware
scheduling compared to the previous scheduling strategy.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-02 08:07:19 -07:00
Changwoo Min
f3fd6e9cb3 scx_lavd: drop 2-level-scheduling
With optimizations of calculatring ineligibility duration, now the
scheduler works well under heavy load without 2-level scheduling, so we
drop it for simplicitiy.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-02 21:46:07 +09:00
Changwoo Min
c38e749c36 scx_lavd: improve the equation for calculating ineligibility duration
This commit include a few changes:
- treat a new forked task more conservatively
- defer the execution of more tasks for longer time using ineligibility duration
- consider if a task is waken up in calculating ineligibility duration
2024-08-02 21:08:29 +09:00
Andrea Righi
bee0d699ef scx_bpfland: always re-align task's vruntime to the global vruntime
Immediately re-align p->scx.dsq_vtime to the global vruntime (+/- slice
lag) as soon as we are evaluating the task's vruntime.

This allows rapidly chase the minimum global vruntime, ensuring to not
over prioritize tasks tasks with a predominantly sleeping behavior
pattern.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-08-02 13:11:25 +02:00
Changwoo Min
5e194330f0 scx_lavd: consider task's wakeup and vruntime (starvation) more aggressively
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-02 12:25:29 +09:00
Daniel Hodges
c3ce7f154e
Merge pull request #461 from hodgesds/layered-dsq-cleanup
scx_layered: Fix enqueue fallback CPU selection
2024-07-31 16:10:02 -04:00
Daniel Hodges
de7b5fe190 scx_layered: Fix dispatch fallback CPU selection
When the previous CPU for a task is not known do not fall back to
dispatching to CPU 0, use the current CPU.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-31 12:35:22 -07:00
Changwoo Min
fc0ffeb45b scx_lavd: print the overall status of a scheduled task
L or R: Latency-critical, Regular
H or I: performance-Hungry, performance-Insensitive
B or T: Big, liTtle
E or G: Eligible, Greedy
P or N: Preemption, Not

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 19:00:35 +09:00
Changwoo Min
22d4b13e8e scx_lavd: classify CPUs into BIG and little ones based on their average capacity
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 19:00:35 +09:00
Changwoo Min
0ad2f30fa8
Merge pull request #460 from multics69/lavd-misc
scx_lavd: misc updates
2024-07-31 08:55:04 +09:00
Daniel Hodges
c224154866
Merge pull request #459 from hodgesds/layer-cpu-counter
scx_layered: Add per cpu layer iterator offset
2024-07-30 16:00:37 -04:00
Daniel Hodges
4f12bebaa5 scx_layered: Add per cpu layer iterator offset
Add a per cpu counter offset to round robin when iterating on layers.
This is to make selection from different layers more fair.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-30 10:44:41 -07:00
Changwoo Min
9b455cf010
Merge pull request #458 from sched-ext/lavd-fix-cpu-ctx-size
scx_lavd: set correct size for cpu_ctx_stor
2024-07-31 00:39:13 +09:00
Changwoo Min
6136cbee65 scx_lavd: tuning the time slice and preemption margins
Tuning the time slice under high load and change the kick/tick margins
for preemption more conservative. Especially, aggressive IPI-based
preemption (kick) causes performance unstability.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:30:59 +09:00
Changwoo Min
35b0d9f3c2 scx_lavd: improve starvation factor equation
Instead of using coarse-grained log(), let's directly use the ratio of
task's service time. Also, the virtual dealine equation is also updated
to reflect this change.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:27:17 +09:00
Changwoo Min
f9657a549f scx_lavd: fix bpf verification error in old kernel versions
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:22:43 +09:00
Changwoo Min
d2615b4975 scx_lavd: fix warnings from the rust code
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:21:32 +09:00
Andrea Righi
2015faa745 scx_lavd: set correct size for cpu_ctx_stor
The max_entries parameter in BPF_MAP_TYPE_PERCPU_ARRAY defines the
number of values per CPU and for cpu_ctx_stor we only need one item: the
CPU context.

Set max_entries to 1 to avoid allocating unnecessary memory and slightly
reduce the memory footprint.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-30 09:32:55 +02:00
Changwoo Min
643edb5431
Merge pull request #457 from multics69/lavd-amp-v2
scx_lavd: support two-level scheduling for heavy-loaded cases (like bpfland)
2024-07-30 10:39:06 +09:00
Changwoo Min
4c3fcfe2ea
Merge pull request #456 from multics69/lavd-div-zero-fix
scx_lavd: fix div by zero error in some installations
2024-07-30 10:35:21 +09:00
Changwoo Min
b91c1e4759 scx_lavd: add more comments on no_2_level_scheduling implementation
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-29 12:22:28 +09:00
Changwoo Min
f71fff9bbe scx_lavd: print a warning message when system does not provide a proper freq info
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:53:02 +09:00
Changwoo Min
4449d8e31c scx_lavd: incorporate a task's static priority in calculating its latency criticality
That's because static (nice) priority is a strong hint to distinguish
latency-critical tasks.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:41:43 +09:00
Changwoo Min
221f1fe12a scx_lavd: further prioritize producers over consumers
That is because many latency-critical tasks are producers.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:38:54 +09:00
Changwoo Min
7106e8cdca scx_lavd: support two-level scheduling for heavy-loaded cases
We introduce two-level scheduling similar to scx_bpfland. The two-level
scheduling consists of two DSQs: 1) latency-critical run queue and 2)
regular run queue. The scheduler prioritizes scheduling tasks on the
latency-critical queue but makes its best effort to schedule tasks on
the regular queue. The scheduler could be more resilient under heavy
load by segregating regular, non-latency-critical tasks from
latency-critical tasks.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:33:17 +09:00
Changwoo Min
9236c3e57c scx_lavd: increase the targeted latency for heavy loaded cases
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:30:01 +09:00
Changwoo Min
230512208d scx_lavd: fix div by zero error in some installations
The max frequency information from topology (from sysfs) seems not
always true. In some installations, it returns zero for all CPUs. In
this case, let's just consider all CPUs have the same capacity (1024),
hoping the kernel can give more preceise information.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 12:47:00 +09:00
Changwoo Min
59e54f4972 scx_lavd: print how to disable logging
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 12:31:51 +09:00
Changwoo Min
df1108ec6c scx_lavd: segregate starvation factor from the latency criticality (refactoring)
Latency criticality is a task's inherent property, but the starvation
factor is its dynamic status for the urgency of scheduling. Hence, we
segregate the starvation factor out. Also, cleaned up unnecessary
arguments and struct fields related.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-27 17:25:39 +09:00
Changwoo Min
d4a5a629ff
Merge pull request #452 from multics69/lavd-core-compaction-v2
lavd_lavd: initial support for AMP (asynmmetric multi-processor) architecture
2024-07-27 16:22:27 +09:00