Commit Graph

1508 Commits

Author SHA1 Message Date
Avraham Hollander
7a43801d76 Add quotes for clarity 2024-08-26 13:20:01 -04:00
Avraham Hollander
0b6ebf826e scx_lavd, scx_mitosis, scx_rusty: Add comma for grammatical consistency
with the same change in the other schedulers
2024-08-26 13:06:58 -04:00
Avraham Hollander
07039f1f07 scx_layered: Documentation cleanup 2024-08-26 13:03:52 -04:00
Changwoo Min
1c1cc1030a
Merge pull request #558 from multics69/lavd-revise
scx_lavd: move time slice calculation to ops.enqueue() and ops.select_cpu() and more
2024-08-26 17:55:39 +09:00
Changwoo Min
0f97ca3066 scx_lavd: drop time slice calculation in ops.select_cpu()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 17:55:00 +09:00
Changwoo Min
4e3c36ca3f scx_lavd: handle the missing cases in time slice calculation
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
be7d06e280 scx_lavd: make the old BPF verifier happy :-(
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
82f55b95b2 scx_lavd: add a fast path in pick_idle_cpu() when SMT is not activated
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
38779dbe8b scx_lavd: improve pick_idle_cpu()
Now it checks an active cpumask within a previous core's compute domain
before checking the full active CPUs.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
d1d9e97d08 scx_lavd: reduce LAVD_CPDOM_MAX_DIST to 4
The BPF verifier in the old kernel gives up to analysis the nested loop
in the consume_task(). We reduce the loop less complex by reducing
LAVD_CPDOM_MAX_DIST from 6 to 4 in order to make the verifier happy.
Note that the theoretical maximum distance is 6 (numa > llc > core type)
but there is no such hardware today, hence reducing it to 6 should be
okay in next few years, when hopefully the verifier becomes smarter.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
950710990f scx_lavd: move time slice calculation to ops.enqueue() and ops.select_cpu()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
954b684a70 scx_lavd: update nr_queued_task every system stat update interval
Updating nr_queue_task every runqueue operation is expensive and
unnecessary. So we do update every system state update interval and use
moving average, which is accurate enough.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
4f906f1f49 scx_lavd: update README since it supports multi-CCX/NUMA
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
9551657b42 scx_lavd: prefer big cores in the performance mode
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
d4bb35e651 scx_lavd: use itertools::iproduct!() for a nested loop
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
9368c6881d scx_lavd: replace get_task_cpu_id() to scx_bpf_task_cpu()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Daniel Hodges
bf67b83561
Merge pull request #549 from hodgesds/stats-path-fix
scx_utils: Add retryable errors
2024-08-25 15:22:46 -04:00
Daniel Hodges
c31f2b63cb
scx_utils: Add retryable errors
Add a set of retryable errors for scx_utils when connecting to the stats
server.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-25 14:54:36 -04:00
Andrea Righi
a469f0f1ce
Merge pull request #561 from sched-ext/bpfland-fix-energy-profile-refresh
scx_bpfland: prevent reading energy profile if not available
2024-08-25 18:31:34 +02:00
Tejun Heo
ca13e13ad6
Merge pull request #559 from sched-ext/htejun/cargo-workspace
build: Use workspace to group rust sub-projects
2024-08-25 06:26:18 -10:00
Andrea Righi
f8acd069f0 scx_bpfland: prevent reading energy profile if not available
Avoid to periodically read the current performance profile from
/sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference if
it's not available (i.e., with older CPUs or kernels without cpufreq).

This fixes issue #560.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 16:53:35 +02:00
Andrea Righi
8853d9a9f2
Merge pull request #548 from sched-ext/rustland-core-refactoring
scx_rustland_core: user-space framework refactoring
2024-08-25 16:39:28 +02:00
Tejun Heo
2f356bf09e
Merge pull request #557 from sched-ext/htejun/scx_stats-more
scx_stats: Make StatsServerData::describe_meta() output more readable
2024-08-25 01:07:39 -10:00
Tejun Heo
a3ebd86f55
Merge pull request #556 from sched-ext/htejun/scx_bpfland-stats
scx_bpfland: Use scx_stats
2024-08-25 01:07:11 -10:00
Tejun Heo
4ff4ce7779
Merge pull request #555 from sched-ext/htejun/scx_stats
scx_stats: Shorten exported names and add prelude module
2024-08-25 01:06:45 -10:00
Tejun Heo
43950c65bd build: Use workspace to group rust sub-projects
meson build script was building each rust sub-project under rust/ and
scheds/rust/ separately. This means that each rust project is built
independently which leads to a couple problems - 1. There are a lot of
shared dependencies but they have to be built over and over again for each
proejct. 2. Concurrency management becomes sad - we either have to unleash
multiple cargo builds at the same time possibly thrashing the system or
build one by one.

We've been trying to solve this from meson side in vain. Thankfully, in
issue #546, @vimproved suggested using cargo workspace which makes the
sub-projects share the same target directory and built together by the same
cargo instance while still allowing each project to behave independently for
development and publishing purposes.

Make the following changes:

- Create two cargo workspaces - one under rust/, the other under
  scheds/rust/. Each contains all rust projects underneath it.

- Don't let meson descend into rust/. These are libraries used by the rust
  schedulers. No need to build them from meson. Cargo will build them as
  needed.

- Change the rust_scheds build target to invoke `cargo build` in
  scheds/rust/ and let cargo do its thing.

- Remove per-scheduler meson.build files and instead generate custom_targets
  in scheds/rust/meson.build which invokes `cargo build -p $SCHED`.

- This changes rust binary directory. Update README and
  meson-scripts/install_rust_user_scheds accordingly.

- Remove per-scheduler Cargo.lock as scheds/rust/Cargo.lock is shared by all
  schedulers now.

- Unify .gitignore handling.

The followings are build times on Ryzen 3975W:

Before:
  ________________________________________________________
  Executed in  165.93 secs    fish           external
     usr time   40.55 mins    2.71 millis   40.55 mins
     sys time    3.34 mins   36.40 millis    3.34 mins

After:
  ________________________________________________________
  Executed in   36.04 secs    fish           external
     usr time  336.42 secs    0.00 millis  336.42 secs
     sys time   36.65 secs   43.95 millis   36.61 secs

Wallclock time is reduced 5x and CPU time 7x.
2024-08-25 00:47:58 -10:00
Andrea Righi
41dfa2481b scx_rustland_core: update README.md
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 12:39:08 +02:00
Andrea Righi
894f9582d0 scx_rustland_core: hide shutdown boilerplate in BpfScheduler
Refactor the code to hide the shutdown handling inside BpfScheduler and
simply use the exited() method to check when the scheduler is stopped.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 12:17:04 +02:00
Tejun Heo
c93191a213 scx_stats: Make StatsServerData::describe_meta() output more readable
Add deliminators to make the output easier on the eyes.
2024-08-24 23:15:52 -10:00
Tejun Heo
152a8471cc scx_bpfland: When reporting stats, use interval deltas
Three of the reported stats are cumulative. While they obviously can be
processed into delta values, that holds for the other direction too and the
cumulative values are difficult to make intutive sense of. Report interval
delta values instead.

Note that a stats client can reliably build back cumulative values even
under heavy system contention - the delta values reported between two
consecutive reads are guaranteed to be correct regardless of the duration of
the interval.
2024-08-24 23:14:57 -10:00
Tejun Heo
bd68e230b9 scx_bpfland: Convert to scx_stats
Use scx_stats instead of prometheus for stats reporting. This has a few
advantages:

- Stats metadata can be defined more succinctly.

- Natural support for nesting statistics which will be useful in making
  scheduler components composable.

- Support for multiple programmable readers where each reader can use their
  own reading interval.

- Built-in stats help message generation.

- Openmetrics integration is still available through
  scx_stats/scripts/scxstats_to_openmetrics.py.
2024-08-24 23:14:55 -10:00
Tejun Heo
625381280c scx_stats: Shorten exported names and add prelude module
Let's make it a bit easier to use:

- Shorten exported names by changing the prefix from ScxStats to Stats. This
  should be distinctive enough and more inline with how most libraries name
  their exports.

- Importing the right set of traits can be tricky. Introduce prelude module
  so that importing is a bit less painful.
2024-08-24 22:04:25 -10:00
Andrea Righi
a2e97fecbb scx_rustland_core: merge verbose and debug in the same option
There is no reason to have two separate options for "verbose" and
"debug" mode. Just merge the two and always use "debug". If enabled,
increase verbosity to stdout and enable reporting BPF scheduling events
in debugfs (e.g., /sys/kernel/debug/tracing/trace_pipe).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:45:20 +02:00
Andrea Righi
cb16a11342 scx_rustland_core: get rid of the global scheduler's slice_us
Since scx_rustland_core enables setting a time slice on a per-task basis
during task dispatch, there's no need to maintain a global time slice in
the BPF component. Instead, a global time slice can simply be managed in
user-space, achieving the same outcome.

Therefore, drop the global slice_us property from BpfScheduler to
simplify the API.

NOTE: if a time slice is not specified for a task, SCX_SLICE_DFL will be
used by default.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:45:18 +02:00
Andrea Righi
e404bee5e7 scx_rustland / scx_rlfifo: small code format fixes
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:44:52 +02:00
Andrea Righi
1cd11ba916 scx_rlfifo: improve documentation and code readability
Add more comments to make the source code more understandable, so that
it can be easily used as a template for implementing more complex
scheduling policies.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:44:28 +02:00
Changwoo Min
ec4c2cd8d1
Merge pull request #554 from sched-ext/htejun/scx_stats
scx_lavd, scx_stats: Remove unnecessary messages
2024-08-25 16:04:00 +09:00
Tejun Heo
52aed6fecf scx_stats: Don't warn on missing "top" stats
Some schedulers may not want to expose "top" stats. No need to warn on it.
Change the message to debug.
2024-08-24 18:51:05 -10:00
Tejun Heo
35a4326aee scx_lavd: Drop unnecessary stat field explanation on startup
The scheduling instances no longer prints out sched samples. No reason to
print field explanation on startup.
2024-08-24 18:48:54 -10:00
Changwoo Min
847832f5f2
Merge pull request #553 from sched-ext/htejun/scx_lavd-stats
scx_lavd: Switch introspection to use scx_stats
2024-08-25 11:57:49 +09:00
Changwoo Min
02ad793c78
Merge branch 'main' into htejun/scx_lavd-stats 2024-08-25 11:57:41 +09:00
Changwoo Min
8b1874c27f
Merge pull request #552 from CachyOS/lavd-mutli-cxx2
scx_lavd: Drop message about unsupported multi-CXX support
2024-08-25 11:48:12 +09:00
Tejun Heo
fdfb7f60f4 Merge branch 'main' into htejun/scx_lavd-stats 2024-08-24 15:53:53 -10:00
Tejun Heo
55e5b8b43f scx_lavd: Switch to scx_stats
Scheduling sample reporting is switched to use scx_stats. This makes the
scheduler run without making too much noise while still allowing monitoring
on demand. It can also make introspection more dynamic - e.g. it shouldn't
be difficult to add other monitoring commands which take scheduling samples
based on different criteria or add other types of staisitcs.

--nr_sched-samples is replaced with --monitor-nr-samples.
2024-08-24 15:53:02 -10:00
Tejun Heo
1bba713a29
Merge pull request #542 from sched-ext/htejun/scx_stats
scx_stats, scx_rusty, scx_layered: Implement `--help-stats`
2024-08-24 15:38:36 -10:00
Tejun Heo
5ae459c9fc
Merge pull request #550 from CachyOS/fix-clang-rc
get_clang_ver: Fix regex for LLVM RC Versions
2024-08-24 15:37:50 -10:00
Peter Jung
906d054770
scx_lavd: Drop message about unsupported multi-CXX support
Signed-off-by: Peter Jung <admin@ptr1337.dev>
2024-08-25 01:10:38 +02:00
Peter Jung
0faa0efe74
get_clang_ver: Fix regex for LLVM RC Versions
Signed-off-by: Peter Jung <admin@ptr1337.dev>
2024-08-25 00:49:00 +02:00
Andrea Righi
0aa23481de scx_rustland_core: drop update_tasks() and introduce notify_complete()
The update_tasks() API is somewhat confusing, so replace it with a
clearer API, notify_complete().

This new API will return control to the BPF component and inform it
about the number of tasks still pending in the user-space scheduler.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 00:45:23 +02:00
Daniel Hodges
e81faef103
Merge pull request #544 from hodgesds/layered-tgid
scx_layered: Add layer match for tgid
2024-08-24 16:58:19 -04:00