Commit Graph

1774 Commits

Author SHA1 Message Date
Jake Hillion
b89f3fe70b
ci: enable on merge_group action (#698)
Enabling the `caching-build` workflow on `merge_group` actions. This is required to enable merge queues which should make merging lots of PRs while getting full CI testing possible. Currently `lint` is the only required check but we will add more in future.
2024-09-26 17:40:47 +01:00
Daniel Hodges
326ccb3385
Merge pull request #696 from hodgesds/layered-growth-fixes
scx_layered: Fix idle core selection
2024-09-26 11:59:19 -04:00
likewhatevs
2f0424acf3
Merge pull request #697 from likewhatevs/cache-ci-less-2
do not cache fast jobs dependencies (to maintain slow dependencies caches)
2024-09-26 11:33:21 -04:00
Pat Somaru
e6e91f8755
do not cache fast jobs dependencies 2024-09-26 11:27:08 -04:00
likewhatevs
bf68679d35
Setup "debugging" and misc cleanup (#695)
* Fix a couple of misc errors in build scripts.
* Tweak scripts/kconfigs to make bpftrace work.
* Update how CI caching works to make builds faster (6 minute turnaround
  time)
* Update CI config to generate per-scheduler debug archives w/ guest
  dmesg/scheduler stdout, guest stdout, bpftrace script output,
  veristat output.

* Update build scripts to accept the following:
** VNG RW -- write to host filesystem (better caching, logging).
* For stress tests in particular (via ini config):
** QEMU Opts -- to facilitate reproducing bugs (i.e. high core count).
** bpftrace scripts -- specify bpftrace scripts to run during stress
tests.
2024-09-26 11:11:10 -04:00
Daniel Hodges
d1b425d1fa scx_layered: Fix idle core selection
Fix idle core selection to correctly use pick_idle_cpu_from.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-26 07:46:26 -07:00
Tejun Heo
818e829e01
Merge pull request #692 from sched-ext/htejun/sync-kernel
Sync from kernel and re-enable scx_flatcg and scx_pair
2024-09-25 13:04:23 -10:00
Tejun Heo
540576ac30 scheds/c: Re-enable scx_flatcg and scx_pair
cgroup support is availale again, re-enable scx_flatcg and scx_pair.
2024-09-25 12:38:45 -10:00
Tejun Heo
466b7984c2 Sync from kernel - a748db0c8c6a ("tools/sched_ext: Receive misc updates from SCX repo")
Sync from sched_ext/for-6.12-fixes a748db0c8c6a ("tools/sched_ext: Receive
misc updates from SCX repo")

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.12-fixes

- cgroup support + scx_flatcg updates.

- scx_bpf_dispatch[_vtime]_from_dsq() + scx_qmap updates.

- COMPAT macros.
2024-09-25 12:34:03 -10:00
likewhatevs
bd2e90b0b6
run cargo +nightly-2024-09-10 fmt to fix lint err (#691)
run cargo +nightly-2024-09-10 fmt to fix lint err
2024-09-25 17:34:53 -04:00
likewhatevs
2282a0af37
enable bpftrace when using stress tests (#688)
* enable bpftrace when using stress tests

update meson/stress test runner to enable
running bpftrace scripts while running
stress tests.

* disable layered stats output on ci
2024-09-25 17:20:36 -04:00
Daniel Hodges
f9b39244cc
Merge pull request #687 from hodgesds/layered-growth-enum-refactor
scx_layered: Add layer growth algo to layer bpf config
2024-09-25 15:10:00 -04:00
Daniel Hodges
73d33e86e8
Merge pull request #686 from frelon/scx_utils-gpu-topo
scx_utils: Add gpu-topology crate feature
2024-09-25 15:09:27 -04:00
Daniel Hodges
bce840d9e5 scx_layered: Add layer growth algo to layer bpf config
Add an enum for the layer growth algo to the bpf layer config. This will
be useful for implementing topology aware layer growth algorithms.
When selecting an idle CPU the current logic tries to keep tasks
local to LLC/NUMA node. However, for certain growth algorithms (ex:
RoundRobin) this is suboptimal. Adding the layer growth algorithm
will allow for different paths for CPU selection in the idle/preemption
paths.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-25 12:00:24 -07:00
Tejun Heo
64b815b6a6
Merge pull request #676 from MitchellAugustin/scx_loader_automatic
scx_loader: Add initial automatic scheduler switching via --monitor-no-dbus
2024-09-25 08:03:51 -10:00
likewhatevs
8ed7777ecc
enable build and test outside src dir (#685)
enable build and test outside src dir

add a option `build_outside_src`, defaulted
to false, to undo a tweak to enable meson to work
well with builds within src dir.

this makes some tools work better + enables
moving builds etc. to /tmp (faster for iteration if
ramdisk)
2024-09-25 12:00:15 -04:00
Fredrik Lönnegren
bbce932393 scx_utils: Add gpu-topology crate feature
The gpu-topology feature can be enabled to include GPUs when generating
a topology-map. Disabling the feature will remove the nvml-wrapper
dependency as well as GPU-specific code in topology.rs.

Most of the code was moved to a new module in rust/scx_utils/src/gpu.rs
but some of it was kept in topology.rs and hidden behind #[cfg(feature =
"gpu-topology")].

Signed-off-by: Fredrik Lönnegren <fredrik@frelon.se>
2024-09-25 14:54:52 +02:00
Andrea Righi
aea431c0c6
Merge pull request #678 from sched-ext/rustland-core-fix-mm-stall
scx_rustland_core: fix mm stall
2024-09-24 23:26:24 +02:00
likewhatevs
99d1179866
enable ide's etc. to work on bpf.c files (#668)
* enable ide's etc. to work on the bpf.c files
this makes it so that clangd and ide tools which use clangd
can work on the bpf.c code.

nothing should actually be changed outside of that ide/editor
environment, all the changes are ifdef'ed on LSP which is set
in the added .clangd file.

* move intf include out of both sides of ifdef toggle
2024-09-24 16:55:02 -04:00
Daniel Hodges
2805bb77a5
Merge pull request #683 from hodgesds/layered-idle-topo
scx_layered: Make layered idle CPU selection topology aware
2024-09-24 16:37:23 -04:00
likewhatevs
d8246fdcd6
clean up ci/make ci nicer (#682)
* make ci nicer

Replace build scheds and merged with caching build, and rename
caching build to build-and-test.

This should make the CI reports on PRs be nice and specific
(i.e. at a glance, know what passes and what fails).

It also keeps PR CI jobs up to date (as folks edit things) and
has them all use one config/24.04 etc.

* prevent untar permission errors from causing cache misses
2024-09-24 16:36:56 -04:00
Daniel Hodges
0b4a2af87f
Merge pull request #681 from hodgesds/layered-preempt-fix
scx_layered: Restrict preemption to layer cpumask
2024-09-24 15:36:34 -04:00
Daniel Hodges
679dd5920c scx_layered: Fix comment
Fix comment to be more accurate.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-24 12:31:03 -07:00
Daniel Hodges
87c6e276d9 scx_layered: Restrict idle selection to layer cpus
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-24 09:40:38 -07:00
Daniel Hodges
e68fccd26c scx_layered: Update comments on layer preemption
Update comments on layer preemption to be more descriptive.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-24 08:24:04 -07:00
Mitchell Augustin
e4eaed07a8 Change monitor_no_dbus to auto, use enum-string conv, use log::info, use Tokio spawning 2024-09-24 09:36:27 -05:00
likewhatevs
652c923624
Merge pull request #680 from likewhatevs/main
add 'continue on error' to stress tests in ci jobs
2024-09-24 09:43:10 -04:00
Daniel Hodges
6b966cda0c scx_layered: Restrict preemption to layer cpumask
When preempting restrict preemption to the current layer cpumask. This
may reduce the amount of preemption, but cause better cache locality
of preempted tasks.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-24 06:23:01 -07:00
Pat Somaru
878eaf042e
add 'continue on error' to ci jobs
I think I see PRs being harder to write because all parts of a CI job
are cancelled when one fails.

I think I am also starting to see that we have enough largely disjoint
moving pieces that there will often be one that is failing stress
tests at any time.

Make CI run all stress tests always to address this.
2024-09-24 09:17:55 -04:00
Mitchell Augustin
b7e82bdeba
Merge branch 'sched-ext:main' into scx_loader_automatic 2024-09-24 08:16:37 -05:00
Daniel Hodges
39c06d092c
Merge pull request #679 from vax-r/move_cast_mask
scx_common_bpf: Append cast_mask()
2024-09-24 07:40:27 -04:00
I Hsin Cheng
61cb3f7fc5 scx_common_bpf: Append cast_mask()
Remove cast_mask() function distributed throughout different schedulers
and add it in common.bpf.h so every scheduler can reference it once they
need to.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-09-24 16:01:19 +08:00
Andrea Righi
0a57b93846 scx_rustland_core: prevent mm stall
Bypass user-space scheduling for tasks currently handling a page fault,
preventing potential deadlock conditions involving VMA lock / mmap_lock
during user-space scheduling.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-24 08:46:14 +02:00
Andrea Righi
34bc6a2b64 Revert "scx_rustland_core: dispatch all kthreads directly from BPF"
This reverts commit 809d39aa7f.

Dispatching all kthreads directly doesn't really help much at preventing
stalls with the stress-ng fork stressor, so revert this commit. A better
workaround will be provided in the next commit.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-24 08:24:01 +02:00
Changwoo Min
b1bc4033b4
Merge pull request #673 from multics69/lavd-prop-lat-cri
scx_lavd: propagate waker's latency criticality to its wakee
2024-09-24 07:34:07 +09:00
Mitchell Augustin
ab1c737e9e
Merge branch 'sched-ext:main' into scx_loader_automatic 2024-09-23 17:19:13 -05:00
Mitchell Augustin
d434ab4266 scx_loader: Add initial automatic scheduler switching via --monitor-no-dbus
Exposes an option --monitor-no-dbus in scx_loader that will monitor CPU
utilization and start scx_lavd when any CPU exceeds 90% for more than 5
seconds. scx_lavd will be terminated if all CPUs are below 90% for
more than 30 seconds. When this flag is specified, scx_loader's
dbus functionality is not utilized.
2024-09-23 17:07:43 -05:00
Daniel Hodges
29fb647c93 scx_layered: Refactor idle core selection
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-23 12:01:42 -07:00
Daniel Hodges
380fd1f3b3 scx_layered: Make idle select topology aware
Make idle CPU selection topology aware.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-23 10:10:43 -07:00
Daniel Hodges
1b5d23dfe1
Merge pull request #675 from hodgesds/layered-dsq-dump-cleanup
scx_layered: Cleanup dump format
2024-09-23 13:09:21 -04:00
Daniel Hodges
35477970bd scx_layered: Cleanup dump format
Cleanup the dump format for topology aware dumps in scx_layered.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-23 10:02:49 -07:00
Daniel Hodges
8b14e48994
Merge pull request #671 from hodgesds/layered-last-waker
scx_layered: Add waker stats per layer
2024-09-23 10:58:54 -04:00
Changwoo Min
71fa92cf1c scx_lavd: propagate waker's latency criticality to its wakee
If a waker is more latency critical than a wakee, inherit a waker's
latency criticality for the wakee. This allows the wakee to consider the
context of who wakes me up. For now, we limit such inheritance to one
hop and one schedule.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-09-23 12:56:16 +09:00
Changwoo Min
ad8536b4a4
Merge pull request #670 from multics69/lavd-opt-preemption
scx_lavd: find a victim cpu for preemption within task's compute domain
2024-09-23 10:22:08 +09:00
Daniel Hodges
91d32663bd
scx_layered: Refactor waker tracking to only use last waker
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-22 18:05:54 -04:00
Daniel Hodges
1a2f82b91c
Merge pull request #666 from hodgesds/layered-local-llc
scx_layered: Add topology aware preemption
2024-09-22 17:36:32 -04:00
Daniel Hodges
326f3b7988
Merge pull request #667 from hodgesds/layered-pcore-grow
scx_layered: Add Big/Little core growth algos
2024-09-22 16:59:42 -04:00
Daniel Hodges
1ac9712d2e
scx_layered: Refactor preemption into a separate function
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-22 16:54:11 -04:00
Daniel Hodges
bc34bd867b
scx_layered: Add option to enable XNUMA preemption
Disable XNUMA preemption by default and add an option to enable it.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-22 16:52:57 -04:00
Daniel Hodges
55b185313a
Remove unneeded Cargo lock file
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-09-22 16:52:57 -04:00