Commit Graph

1957 Commits

Author SHA1 Message Date
Ming Yang
f7cdf08754 scx_mitosis: Fix static assertion of scx_bpf_task_cgroup failing __weak check
it failed the static assertion in macro bpf_ksym_exists.

Signed-off-by: Ming Yang <minos.future@gmail.com>
2024-10-13 07:57:12 -07:00
Ming Yang
7067092555 Add vmlinux.h for multiple arch
Following the change of using per-arch vmlinux.h. Add it for the
remaining archs.

Signed-off-by: Ming Yang <minos.future@gmail.com>
2024-10-13 07:57:12 -07:00
Ming Yang
a23f3566e3 Use per-arch vmlinux.h
vmlinux.h is not compatible across archs.

Handle this compatibility issue by
* Add arch info into vmlinux.h real file name
* Link vmlinux.h to the target-arch real file at build time
* Use target-arch real file for scx_utils bindgen.

Also refactored clang related logic into a new clang_info mod, which is
shared by bpf_builder.rs and builder.rs.

Signed-off-by: Ming Yang <minos.future@gmail.com>
2024-10-13 07:57:12 -07:00
Daniel Hodges
03f078ac74
Merge pull request #792 from hodgesds/ftrace-perfetto
scripts: Add ftrace perfetto helper scripts
2024-10-13 13:03:47 +00:00
Daniel Hodges
cc3fede8e0
scripts: Add ftrace helper scripts
Add a set of ftrace helper scripts for making perfetto compatible ftrace
scheduler profiles.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-13 09:00:07 -04:00
Changwoo Min
ba9d75a6ab
Merge pull request #791 from multics69/lavd-sched-sample
scx_lavd: misc updates
2024-10-13 08:52:08 +00:00
Changwoo Min
c1f4051a14 scx_lavd: fix int overflow in calculating avg_lat_cri
u32 is not big enough to hold the sum of lat_cri in a period,
so sum_lat_cri (u32) was overflown, resulting in incorrect
avg_lat_cri. Change the type from u32 to u64, avoiding the
interger overflow. Note that {sum/avg}_lat_cri is only for
deubugging so it is irrelevant in making scheduling decisions.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-13 00:58:36 +09:00
Changwoo Min
6c9bbe66dc scx_lavd: remove unnecessary downscaling in deadline calculation
The downscaling is not necessary in calculating task's virtual
deadline because virtual dealine represents only relative order
in task scheduling. Hence downscaling incurs only inacuracy
caused by truncation.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-13 00:41:23 +09:00
Changwoo Min
dce67296bf
Merge pull request #790 from multics69/lavd-sched-sample
scx_lavd: do not inspect scx_lavd process itself
2024-10-12 14:18:59 +00:00
Daniel Hodges
a846fb3be6
Merge pull request #789 from hodgesds/vtime-dist
Add bpftrace script to print vtime distributions across DSQs
2024-10-12 12:46:59 +00:00
Daniel Hodges
f59b73b97c
scripts: Add vtime distribution script
Add bpftrace script to print the distribution of vtime across DSQs.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-12 08:40:16 -04:00
Changwoo Min
6ddc3f0a2b scx_lavd: do not inspect scx_lavd process itself
Print the task status of scx_lavd is not useful,
so filter it out.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-12 17:21:08 +09:00
Andrea Righi
7f3b0cb739
Merge pull request #780 from sched-ext/bpfland-remove-pcpu-dsq
scx_bpfland: drop per-cpu DSQs
2024-10-12 06:25:35 +00:00
Andrea Righi
09fee68a6b
Merge pull request #785 from sched-ext/rustland-core-fix-mm-kprobe
scx_rustland_core: use handle_mm_fault kprobe
2024-10-12 06:20:15 +00:00
Andrea Righi
197dee93f4 scx_bpfland: get rid of per-CPU DSQs
Using per-CPU DSQs seems to introduce more issues than benefits
(potential stalls, etc.). Therefore, let's get rid of the per-CPU DSQs
and use SCX_DSQ_LOCAL for tasks directly dispatched to specific CPUs.

This change seems to also improve performance on 6.12 and it makes the
scheduler a lot more stable and consistent.

The issues will be investigated separately, providing a separate stress
test scheduler, designed to stress test per-CPU DSQs.

Tested-by: Piotr Gorski <piotrgorski@cachyos.org>
Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-12 08:15:51 +02:00
Andrea Righi
198f22656c scx_bpfland: clarify error code returned by pick_idle_cpu()
Return more meaningful error codes from pick_idle_cpu(). No functional
change, just improved code readability.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-12 08:08:48 +02:00
Andrea Righi
ceb4f1755f scx_bpfland: always refill task timeslice in ops.dispatch()
When a task exhausts its timeslice and no other tasks are ready to run,
we automatically refill its timeslice, but only if the current CPU is a
fully idle SMT core.

If we don’t handle the refill, the sched_ext core will default to
refilling using SCX_SLICE_DFL, which may not be optimal.

To ensure better control over the task’s timeslice, always refill it
when no other tasks are available to run.

Fixes: 6e24fcc ("scx_bpfland: keep tasks running on full-idle SMT cores")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-12 08:08:48 +02:00
Andrea Righi
54d704ceda scx_bpfland: pick a random idle CPU when prev_cpu is not valid
Pick any random idle CPU when the previous CPU isn't valid anymore
according to the task's cpumask.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-12 08:08:48 +02:00
Changwoo Min
836cf9faa4
Merge pull request #779 from multics69/lavd-futex-v2
scx_lavd: mitigate the lock holder preemption problem
2024-10-12 02:42:33 +00:00
Daniel Hodges
a950f96353
Merge pull request #787 from hodgesds/layered-clean
scx_layered: Cleanup non topology path
2024-10-11 18:26:23 +00:00
Daniel Hodges
a08a76ccd6 scx_layered: Cleanup non topology path
More cleanup in the non topology path to remove copy/pasta declarations.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-11 10:18:34 -07:00
eb59085e61
Merge pull request #781 from JakeHillion/pr781
layered: move configuration into library component
2024-10-11 16:39:23 +00:00
caff46e864
Merge pull request #786 from JakeHillion/pr786
layered: make default value for disable_topology dynamic
2024-10-11 16:25:45 +00:00
Jake Hillion
52c279a469 layered: make default value for disable_topology dynamic
Disable topology currently defaults to `false` (topology enabled...). Change
this so that topology is enabled by default on hardware that may benefit from
it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit
from it.

This is a slightly noisy change as we have to move ownership of the newly
mutable layer specs into the `Scheduler` object (previously they were a
borrow). We don't have a `Topology` object to make the default decision from
until `Scheduler::init`, and I think this is because of the possibility of hot
plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is
potentially mutable.

Test plan:
- CI. Updated to be explicit about topology in both cases.

Single NUMA multi-LLC machine:
```
$ scx_layered --run-example
...
13:34:01 [INFO] Topology awareness not specified, selecting enabled based on
hardware
...
$ scx_layered --run-example --disable-topology=true
...
13:33:41 [INFO] Disabling topology awareness
...
$ scx_layered --run-example -t
...
13:33:15 [INFO] Disabling topology awareness
...
$ scx_layered --run-example --disable-topology=false
# none of the above messages present
```

Single NUMA single LLC machine:
```
$ scx_layered --run-example
15:33:10 [INFO] Topology awareness not specified, selecting disabled based on
hardware
```
2024-10-11 17:09:07 +01:00
Jake Hillion
143a55cda1 layered: move configuration into library component
Move the LayerConfig and its children from `main.rs` into `lib.rs`. This allows
other tooling, such as config managers or test executors, to modify layered
configs programmatically.

The end goal is to move everything in `layered` except for the argument parsing
into a `run_layered` function, but I haven't done it in this diff because it's
a larger change. This is a common pattern in Rust projects to do as little as
possible in `main.rs` for extensibility.

The only change here, other than publicity and where things are located, is the
signature of `CpuPool::alloc_cpus`. It previously relied on `&Layer`, and this
changes it to the two elements of `Layer` it uses. This allows `Layer` to stay
confined to `main.rs` (for now) to prevent scope creep in this PR.

This may be inconvenient in the short term for WIPs and anyone doing non-Cargo
builds (cough me), but having things split into more files should make
rebases/merges easier in the long run.

Test plan:
- `cargo build --release`
- CI.
2024-10-11 15:55:29 +01:00
Andrea Righi
b3c5a23693 scx_rustland_core: use handle_mm_fault kprobe
The symbol __handle_mm_fault isn't available anymore in 6.12, let's rely
on handle_mm_fault that is available both on 6.12 and older kernels.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-11 15:39:34 +02:00
Changwoo Min
648c95be9e scx_lavd: fix incorrect task comparison for preemption
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-11 21:53:24 +09:00
likewhatevs
b88f567e25
Merge pull request #782 from likewhatevs/lsp-nice-util
layered -- make lsp work nice on util include file
2024-10-11 12:30:19 +00:00
Pat Somaru
2b309dbbb4
make lsp work nice on util include 2024-10-11 08:06:29 -04:00
Pat Somaru
7627e1cc42
scx_layered: fix lsp etc on util.bpf.c 2024-10-11 08:02:23 -04:00
Changwoo Min
5b4b255cbb scx_lavd: do not preempt while holding a lock
When a task holds a lock, it should not yield its time slice or it
should not be preempted out. In this way, we can mitigate harmful
preemption of lock holders and reduce the total preemption counts.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-11 18:49:09 +09:00
Changwoo Min
bd17589a6e scx_lavd: boost latency criticality when a task holds a lock
When a lock holder exhausts its time slide, it will be re-enqueued
to a DSQ waiting for shceduling while holding a lock. In this case,
prioritize its latency criticality proportionally, so a lock holder
would be not stuck in a DSQ for a long time, improving system-wide
progress.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-11 18:48:56 +09:00
Changwoo Min
77b8e65571 scx_lavd: tracing all blocking locks and futexes
Trace the acquisition and release of blocking locks for kernel and
fuxtexes for user-space. This is necessary to boost a lock holder
task in terms of latency and time slice. We do not boost shared
lock holders (e.g., read lock in rw_semaphore) since the kernel
already prioritizes the readers over writers.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-11 17:03:48 +09:00
Ryan Wilson
b6f09edc00
Merge pull request #777 from ryantimwilson/reverse-weight-dsq-layered
[layered] Implement reverse weight DSQ algorithm
2024-10-10 20:18:14 +00:00
Ryan Wilson
8c8250b1e2 [layered] Implement reverse weight DSQ algorithm 2024-10-10 12:53:25 -07:00
Daniel Hodges
9f60053312
Merge pull request #775 from hodgesds/layered-idle-cleanup
scx_layered: Cleanup topology preempt path
2024-10-10 18:34:08 +00:00
Daniel Hodges
9940e48cdd
Merge pull request #776 from hodgesds/layered-default-iter-algo
scx_layered: Change default DSQ iter algo
2024-10-10 18:15:07 +00:00
Daniel Hodges
fb4dcf91eb scx_layered: Change default DSQ iter algo
Change the default DSQ iter algo from round robin to linear.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-10 11:10:27 -07:00
Daniel Hodges
b22e83d4d5 scx_layered: Cleanup topology preempt path
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-10 09:56:42 -07:00
Andrea Righi
59cfd4060c
Merge pull request #772 from sched-ext/bpfland-fix-l3-cpumask
scx_bpfland: fix cpumask initialization error
2024-10-10 08:38:34 +00:00
Andrea Righi
d62989e462 scx_bpfland: fix cpumask initialization error
In the WAKE_SYNC path lf L3 cache awareness is disabled (--disable-l3)
we may hit the following error:

  Error: EXIT: scx_bpf_error (CPU L3 cpumask not initialized)

Fix this by setting the L3 cpumask to the whole primary domain if L3
cache awareness is disabled.

Tested-by: Eric Naim <dnaim@cachyos.org>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-10 09:30:54 +02:00
Daniel Hodges
772d89f731
Merge pull request #771 from hodgesds/layered-preempt-no-topo-refactor
scx_layered: Refactor topo preemption
2024-10-10 02:10:22 +00:00
likewhatevs
9c80ebf88a
Merge pull request #770 from likewhatevs/ci-update-tests
scx_layered: setup matrix job to run key paths of layered through verifier/stress
2024-10-10 02:05:20 +00:00
Daniel Hodges
fe00e2c7be
scx_layered: Refactor topo preemption
Refactor topology preemption logic so the non topology aware code is
contianed to a separate function. This should make maintaining the non
topology aware code path far easier.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-09 21:24:07 -04:00
Pat Somaru
5e4a7ac655
setup matrix job to run key paths of layered through verifier/stress test 2024-10-09 21:09:41 -04:00
Daniel Hodges
e7b1feed5a
Merge pull request #769 from hodgesds/layered-debug-cleanup
scx_layered: Cleanup debug messages
2024-10-09 23:55:11 +00:00
Daniel Hodges
451c68b44e
scx_layered: Cleanup debug messages
Cleanup debug messages to use a common prefix when the scheduler is
initialized.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-09 19:06:28 -04:00
Daniel Hodges
4b817dc54b
Merge pull request #768 from hodgesds/layered-verifier-fix
scx_layered: Fix verifier errors
2024-10-09 21:52:02 +00:00
Daniel Hodges
81a5250d49 scx_layered: Fix verifier errors
Fix verifier errors when using different DSQ iteration algorithms and
cleanup some code.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-09 14:36:12 -07:00
Dan Schatzberg
12cf482487
Merge pull request #767 from dschatzberg/mitosis-build
mitosis: Fix build
2024-10-09 19:32:35 +00:00