Commit Graph

1171 Commits

Author SHA1 Message Date
Andrea Righi
74175f5a49 scx_bpfland: properly integrate with meson build
Properly honor the meson build `serialize` option.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-28 21:37:00 +02:00
Andrea Righi
f98c35fd07
Merge pull request #388 from sched-ext/bpfland
scheds: introduce scx_bpfland
2024-06-28 21:27:43 +02:00
Andrea Righi
183b1b2cfc
Merge pull request #399 from sched-ext/meson-serialize
meson: introduce serialize build option
2024-06-28 20:13:16 +02:00
Andrea Righi
b7977f1ce3
Merge pull request #402 from sched-ext/meson-check-git-commit
meson: check if commit exists in remote git repos
2024-06-28 19:03:02 +02:00
Andrea Righi
273728fd2b meson: check if commit exists in remote git repos
When fetching external git repositories (libbpf and bpftool) we don't
check if the target commit exists.

This can leads to issues such as #400, because we may silently use HEAD,
instead of the specified commit.

Prevent this by returning an error when the target SHA1 cannot be found.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-28 15:16:56 +02:00
Andrea Righi
38a05d49f8
Merge pull request #401 from CachyOS/feat/cargo-rel-with-plain
meson: run cargo build in release mode when using plain buildtype
2024-06-28 14:49:12 +02:00
Jose Fernandez
f64cdd1a51
sched_utils: Add log recorder format customization
This change adds the ability to customize the log recorder format for
each metric type. There is a default format that is used if no custom
`MetricFormatter` is provided. This is the same format that was used
before this change.

The `MetricFormatter` should be implemented by the user to customize the
format of the log recorder. The `LogRecorderBuilder` now takes a
`MetricFormatter` as an optional parameter.

Following changes will allow additional customization of the log
recorder format, such as how many metrics are logged per line.

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-28 06:21:03 -06:00
Vladislav Nepogodin
22f13e2284
meson: run cargo build in release mode when using plain buildtype 2024-06-28 16:10:16 +04:00
Andrea Righi
657fb6a4aa
Merge pull request #400 from sched-ext/fix-bpftool
meson: restore previous libbpf version and update bpftool
2024-06-28 13:21:46 +02:00
Andrea Righi
39a06c86f1 meson: restore previous libbpf version and update bpftool
The upstrem bpftool git repo (https://github.com/libbpf/bpftool.git) is
periodically force pushed and the specific commit that we needed is not
available anymore.

Instead of failing we are actually fetching the latest bpftool (HEAD)
that introduced some breakage initially fixed by commit e59c48a6
("Update libbpf commit hash").

However, updating libbpf seems to introduce a run-time problem and all
the schedulers are failing to start:

 libbpf: failed to find skeleton map ''
 libbpf: failed to populate skeleton maps for 'bpf_bpf': -3

So, revert libbpf to the previous version and update the commit for
bpftool to use a version that still allows to generate a compatible BPF
skel.

Fixes: e59c48a6 ("Update libbpf commit hash")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-28 12:43:37 +02:00
Andrea Righi
cf4883fbf8 meson: introduce serialize build option
With commit 5d20f89a ("scheds-rust: build rust schedulers in sequence"),
schedulers are now built serially one after the other to prevent meson
and cargo from forking NxN parallel tasks.

However, this change has made building a single scheduler much more
cumbersome, due to the chain of dependencies.

For example, building scx_rusty using the specific meson target would
still result in all schedulers being built, because they all depend on
each other.

To address this issue, introduce the new meson build option
`serialize=true|false` (default is false).

This option allows to disable the schedulers' build chain, restoring the
old behavior.

With this option enabled, it is now possible to build just a single
scheduler, parallelizing the cargo build properly, without triggering
the build of the others. Example:

  $ meson setup build -Dbuildtype=release -Dserialize=false
  $ meson compile -C build scx_rusty

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-28 10:17:37 +02:00
Tejun Heo
769e6c4e55
Merge pull request #396 from otteryc/main
Update libbpf commit hash
2024-06-27 21:09:00 -10:00
Yu-Cheng Chen
e59c48a6da Update libbpf commit hash
libbpf has added a member `link` in the struct `bpf_map_skeleton`, which causes
a build failure in SCX. This commit updates the libbpf commit hash to the latest
version to avoid this error.

Please refer to commit be998aa in libbpf repo.

Signed-off-by: Yu-Cheng Chen <otteryc210@gmail.com>
2024-06-28 08:42:14 +08:00
Changwoo Min
b18ebfea91
Merge pull request #395 from multics69/lavd-opt-v5
scx_lavd: optimization for communicaiton-intensive workloads and heavily-overloaded cases
2024-06-28 09:28:32 +09:00
Changwoo Min
24a238846e scx_lavd: optimizing deadline related tunables
The competition window was 7.5 msec, half of the targeted latency.
However, it is too wide for some workloads, so unrelated tasks may
compete with each other. Hence, it is tightened to about 1 msec with
LAVD_LAT_WEIGHT_SHIFT to avoid unnecessary competition.

Also, when a system is overloaded, now the time space is stretched more
aggressively (i.e., lat_prio^2) when a task's latency priority is low
(high value).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-28 09:00:45 +09:00
Andrea Righi
02bf883737
Merge pull request #394 from sched-ext/scx-utils-debug-info-btf
scx_utils: clarify error about missing CONFIG_DEBUG_INFO_BTF
2024-06-27 20:50:42 +02:00
Andrea Righi
188b3d3bfc ci: enable stress tests for scx_bpfland
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-27 17:28:42 +02:00
Andrea Righi
7606b95150 scx_bpfland: introduce maximum time slice lag
Introduce a tunable to set a limit of the minimum vruntime that is used
when a task is dispatched, as:

 vtime_min = vtime_now - slice_lag_ns

Increasing the time slice lag can make interactive tasks even more
responsive at the cost of starving regular and newly created tasks.

Default time slice lag is 0.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-27 17:28:42 +02:00
Andrea Righi
5a44329d45 scheds: introduce scx_bpfland
Overview
========

This scheduler is derived from scx_rustland, but it is fully implemented
in BFP with minimal user-space Rust part to process command line
options, collect metrics and logs out scheduling statistics.

Unlike scx_rustland, all scheduling decisions are made by the BPF
component.

Motivation
==========

The primary goal of this scheduler is to act as a performance baseline
for comparison with scx_rustland, allowing for a better assessment of
the overhead caused by kernel/user-space interactions.

It can also be used to deploy prototypes initially tested in the
scx_rustland scheduler. In fact, this scheduler is expected to
outperform scx_rustland, due to the elimitation of the kernel/user-space
overhead.

Scheduling policy
=================

scx_bpfland is a vruntime-based sched_ext scheduler that prioritizes
interactive workloads. Its scheduling policy closely mirrors
scx_rustland, but it has been re-implemented in BPF with some small
adjustments.

Tasks are categorized as either interactive or regular based on their
average rate of voluntary context switches per second: tasks that exceed
a specific voluntary context switch threshold are classified as
interactive.

Interactive tasks are prioritized in a higher-priority DSQ, while
regular tasks are placed in a lower-priority DSQ. Within each queue,
tasks are sorted based on their weighted runtime, using the built-in scx
vtime ordering capabilities (scx_bpf_dispatch_vtime()).

Moreover, each task gets a time slice budget. When a task is dispatched,
it receives a time slice equivalent to the remaining unused portion of
its previously allocated time slice (with a minimum threshold applied).

This gives latency-sensitive workloads more chances to exceed their time
slice when needed to perform short bursts of CPU activity without being
interrupted (i.e., real-time audio encoding / decoding workloads).

Results
=======

According to the initial test results, using the same benchmark "playing
a videogame while recompiling the kernel", this scheduler seems to
provide a +5% improvement in the frames-per-second (fps) compared to
scx_rustland, with video games such as Cyberpunk 2077, Counter-Strike 2
and Baldur's Gate 3.

Initial test results indicate that this scheduler offers around a +5%
improvement in frames-per-second (fps) compared to scx_rustland when
using the benchmark "playing a video game while recompiling the kernel".

This improvement was observed in games such as Cyberpunk 2077,
Counter-Strike 2, and Baldur's Gate 3.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-27 17:28:42 +02:00
Changwoo Min
f86d564d89 scx_lavd: fast path for ops.dispatch() when fully loaded
When fully loaded so all CPUs are using, skip checking the cpumask.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-27 18:00:39 +09:00
Andrea Righi
a7965abdbc scx_utils: clarify error about missing CONFIG_DEBUG_INFO_BTF
If CONFIG_DEBUG_INFO_BTF is not enabled in the kernel, the C schedulers
report the following error via libbpf, clearly indicating the missing
kernel config:

 libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?

In contrast, the Rust schedulers report a less clear error:

 thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
 btf__load_vmlinux_btf() returned NULL
 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Make sure to report a similar error, so that users have a better clue
about the missing kernel config. After this change the error looks like
the following:

 thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
 btf__load_vmlinux_btf() returned NULL, was CONFIG_DEBUG_INFO_BTF enabled?

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-27 09:15:43 +02:00
David Vernet
8537c1b474
Merge pull request #393 from sched-ext/revert-382-rusty_refactor
Revert "scx_rusty: Refactor ridx assignment in populate_tasks_by_load"
2024-06-26 16:36:11 -05:00
David Vernet
fe3ce64a9b
Revert "scx_rusty: Refactor ridx assignment in populate_tasks_by_load" 2024-06-26 17:35:22 -04:00
Changwoo Min
41d60aef04
Merge pull request #391 from multics69/lavd-tuning-v4
scx_lavd: tweaks to avoid fork starvation
2024-06-27 00:21:48 +09:00
Andrea Righi
d26a76f238
Merge pull request #390 from sirlucjan/scx-update2
Revert "Add After=graphical.target into service"
2024-06-26 12:11:49 +02:00
Piotr Gorski
1659152a62
Revert "Add After=graphical.target into service"
This reverts commit f7e575808b.

Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-06-26 12:08:21 +02:00
Changwoo Min
abc6e31fef scx_lavd: for a forked task, inherit its parent's statistics
The old approach was too conservative in running a new task, so when a
fork-heavy workload competes with a CPU-bound workload, the fork-heavy
one is starved. The new approach solves the starvation problem by
inheriting parent's statistics. It seems a good (at least better than
old) guess how a new task will behave.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-26 19:00:10 +09:00
Changwoo Min
ac9c49f5b5 scx_lavd: loosen the deadline when overloaded
When the system is highly loaded with compute-intensive tasks, the old
setting chokes latensive-intensive tasks, so loosen the dealine when the
system is overloaded (> 100% utilization).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-26 15:06:31 +09:00
Changwoo Min
b32734168b scx_lavd: print build ID when lavd is loaded
When the lavd is loaded, it prints out its build id. It helps to easily
identify what version it is when testing.

```
01:56:54 [INFO] scx_lavd scheduler is initialized (build ID: 0.8.1-g98a5fa8595430414115c504857cea1a458393838-dirty x86_64-unknown-linux-gnu)
```

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-26 10:57:19 +09:00
Dan Schatzberg
d349f86d04 mitosis: Update synchronization
The synchronization for mitosis is a bit ad-hoc, working around lack of
atomics in BPF. This commit updates the logic to use READ/WRITE_ONCE and
compiler barriers to get the behaviors we want.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-06-25 12:44:16 -07:00
David Vernet
98a5fa8595
Merge pull request #371 from sched-ext/build_id
Add build-id to build process
2024-06-25 11:45:32 -05:00
David Vernet
d42bae4fcf
rusty: Print build ID when rusty is loaded
When someone is testing schedulers, we often have to ask what version
the scheduler is running as. Now that we can access the build ID from
rust schedulers, let's update scx_rusty to print the build ID when rusty
first starts running.

This results in output such as the following:

```
[void@maniforge scx]$ rusty
19:04:26 [INFO] Running scx_rusty (build ID: 0.8.1-g2043d2537f37c8d75753bb65eb75bca965067564 x86_64-unknown-linux-gnu/debug)
19:04:26 [INFO] NUMA[00] mask= 0b11111111111111111111111111111111
19:04:26 [INFO]   DOM[00] mask= 0b00000000111111110000000011111111
19:04:26 [INFO]   DOM[01] mask= 0b11111111000000001111111100000000
19:04:26 [INFO] Rusty scheduler started!
```

Signed-off-by: David Vernet <void@manifault.com>
2024-06-25 11:44:46 -05:00
David Vernet
2aa8bbc32d
utils: Export build ID values from rust scx_utils
We want schedulers to be able to print, log, etc the build ID of the
repository. To do this, we can use the vergen Cargo crate to generate
environment variables that contain values that we can export from scx_utils.

This patch update scx_utils accordingly to use vergen to generate build
ID output that can be printed from schedulers. A subsequent patch will
update scx_rusty to print this build ID value.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-25 11:44:19 -05:00
David Vernet
9d9ece11aa
Merge pull request #384 from jfernandez/log-recorder
scx_utils: Add log_recorder module for metrics-rs
2024-06-25 11:43:37 -05:00
David Vernet
e60c5c024b
Merge pull request #387 from sirlucjan/scx-update
Add After=graphical.target into service
2024-06-25 11:10:38 -05:00
Andrea Righi
39240a27ce
Merge pull request #380 from sched-ext/rustland-core-smooth-perf
scx_rustland_core: smooth performance
2024-06-25 14:52:58 +02:00
Andrea Righi
5db0908530 scx_rustland_core: make sure to use a valid CPU during direct dispatch
We may end up selecting an invalid CPU (according to the task's cpumask)
when dispatching the task via dispatch_direct_cpu().

When this happens simply return an error and do not dispatch the task
and let the caller handle the error: in the context of select_cpu() we
can simply ignore the dispatch and return the target CPU; in the context
of FIFO mode dispatch we can fallback to SCX_DSQ_LOCAL if the target CPU
is not valid.

This fixes issue #353.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:11:46 +02:00
Andrea Righi
e4b13b2aa6 scx_rustland_core: reduce dispatch overhead
Kick CPUs in the dispatch path only when needed (typically when tasks
are bounced to other CPUs).

Moreover, avoid to consume all the tasks dispatched at once.

This seems to reduce the BPF overhead (according to bpftop), going from
~10% CPU usage down to ~6% CPU usage of rustland_dispatch() on an over
commissioned system, without introducing any measureable performance
regression.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
631d5576dc scx_rustland_core: refactor CPU selection logic
Allow to dispatch tasks directly (bypassing the user-space scheduler)
only when the scheduler is operating in FIFO mode.

On an over-commissioned system, directly dispatching tasks can only
increase OS noise. These tasks can get a brief priority boost and an
extended time slice just because they found an idle CPU, which can lead
to erratic behavior.

This is particularly problematic when measuring performance stability,
such as evaluating the frames-per-second (fps) of a video game on an
overloaded system.

In such cases, it's better to bounce all tasks to the user-space
scheduler, that will ensure a better level of fairness and smoother
performance.

Moreover, get rid of the second chance dispatch logic introduced in
commit 4791d862 ("scx_rustland_core: second chance CPU migration"). This
seems to provide benefits only on certain architectures (Intel), but it
can introduces lags in others (AMD).

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
19217b5722 scx_rustland_core: clarify comment about resuming FIFO mode
Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
081d4bdb86 scx_rustland_core: add debugging to dispatch_direct_cpu()
Report dispatch_direct_cpu() events in the trace, like any other
dispatch-related event.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Piotr Gorski
f7e575808b
Add After=graphical.target into service
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-06-25 11:05:37 +02:00
Changwoo Min
11f685f7ec
Merge pull request #386 from multics69/lavd-tuning-v3
scx_lavd: revising tunables to reduce micro-stutters
2024-06-25 17:09:30 +09:00
Changwoo Min
5d0db5c5fe scx_lavd: revising tunables to reduce micro-stutters
This is a second attempt to optimize tunables for a wider range of
games.

1) LAVD_BOOST_RANGE increased from 14 (35%) to 40 (100% of nice range).
   Now the latency priority (biased by nice value) will decide which
   task should run first . The nice value will decide the time slice.

2) The first change will give higher priority to latency-critical task
   compared to before. For compensation, the slice boost also increased
   (2x -> 3x).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-25 16:13:32 +09:00
Jose Fernandez
e5984ed016
scx_utils: Add log_recorder module for metrics-rs
This change adds a new module to the scx_utils crate that provides a
log recorder for metrics-rs. The log recorder will log all metrics to
the console at a configurable interval in an easy to read format. Each
metric type will be displayed in a separate section. Indentation will
be used to show the hierarchy of the metrics. This results in a more
verbose output, but it is easier to read and understand.

scx_rusty was updated to use the log recorder and all explicit metric
logging was removed.

Counters will show the total count and the rate of change per second.
Counters with an additional label, like `type` in
`dispatched_tasks_total` in rusty, will show the count, rate, and
percentage of the total count.

Counters:
  dispatched_tasks_total: 65559 [1344.8/s]
    prev_idle: 44963 (68.6%) [966.5/s]
    wsync_prev_idle: 15696 (23.9%) [317.3/s]
    direct_dispatch: 2833 (4.3%) [35.3/s]
    dsq: 1804 (2.8%) [21.3/s]
    wsync: 262 (0.4%) [4.3/s]
    direct_greedy: 1 (0.0%) [0.0/s]
    pinned: 0 (0.0%) [0.0/s]
    greedy_idle: 0 (0.0%) [0.0/s]
    greedy_xnuma: 0 (0.0%) [0.0/s]
    direct_greedy_far: 0 (0.0%) [0.0/s]
    greedy_local: 0 (0.0%) [0.0/s]
  dl_clamped_total: 1290 [20.3/s]
  dl_preset_total: 514 [1.0/s]
  kick_greedy_total: 6 [0.3/s]
  lb_data_errors_total: 0 [0.0/s]
  load_balance_total: 0 [0.0/s]
  repatriate_total: 0 [0.0/s]
  task_errors_total: 0 [0.0/s]

Gauges will show the last set value:

Gauges:
  slice_length_us: 20000.00

Histograms will show the average, min, and max. The histogram will be
reset after each log interval to avoid memory leaks, since the data
structure that holds the samples is unbounded.

Histograms:
  cpu_busy_pct: avg=1.66 min=1.16 max=2.16
  load_avg node=0: avg=0.31 min=0.23 max=0.39
  load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39
  processing_duration_us: avg=297.50 min=296.00 max=299.00

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-24 18:45:02 -06:00
Changwoo Min
71425af06d
Merge pull request #385 from multics69/link-blog2
README: add a link to Changwoo's blog post on scx (part 2)
2024-06-25 09:12:01 +09:00
Changwoo Min
2e35bca24c README: add a link to Changwoo's blog post on scx (part 2)
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-25 09:10:24 +09:00
David Vernet
8059acb634
Merge pull request #381 from vax-r/rusty_dom_load_status_check
scx_rusty: Pull domain status check
2024-06-24 17:54:54 -05:00
David Vernet
55ee210d42
Merge pull request #382 from vax-r/rusty_refactor
scx_rusty: Refactor ridx assignment in populate_tasks_by_load
2024-06-24 17:47:55 -05:00
Changwoo Min
45b2c3d9fe
Merge pull request #383 from multics69/lavd-param-tuning
scx_lavd: revising tunables for less-preemptive games
2024-06-24 09:02:20 +09:00