Commit Graph

276 Commits

Author SHA1 Message Date
Tejun Heo
d3e8e52b1a
Merge pull request #44 from arighi/scx-rustland
scx_rustland: rename from scx_rustlite
2023-12-22 08:40:01 +09:00
Andrea Righi
f7f0e3236c scx_rustland: rename from scx_rustlite
Rename scx_rustlite to scx_rustland to better represent the mirroring of
scx_userland (in C), but implemented in Rust.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-22 00:20:14 +01:00
David Vernet
4cadb92003
Merge pull request #38 from arighi/scx-rustlite
scx_rustlite: simple vtime-based scheduler written in Rust
2023-12-21 13:31:52 -06:00
Andrea Righi
086c6dffc8 scx_rustlite: simple user-space scheduler written in Rust
This scheduler is made of a BPF component (dispatcher) that implements
the low level sched-ext functionalities and a user-space counterpart
(scheduler), written in Rust, that implements the actual scheduling
policy.

The main goal of this scheduler is to be easy to read and well
documented, so that newcomers (i.e., students, researchers, junior devs,
etc.) can use this as a template to quickly experiment scheduling
theory.

For this reason the design of this scheduler is mostly focused on
simplicity and code readability.

Moreover, the BPF dispatcher is completely agnostic of the particular
scheduling policy implemented by the user-space scheduler. For this
reason developers that are willing to use this scheduler to experiment
scheduling policies should be able to simply modify the Rust component,
without having to deal with any internal kernel / BPF details.

Future improvements:

 - Transfer the responsibility of determining the CPU for executing a
   particular task to the user-space scheduler.

   Right now this logic is still fully implemented in the BPF part and
   the user-space scheduler can only decide the order of execution of
   the tasks, that significantly restricts the scheduling policies that
   can be implemented in the user-space scheduler.

 - Experiment the possibility to send tasks from the user-space
   scheduler to the BPF dispatcher using a batch size, instead of
   draining the task queue completely and sending all the tasks at once
   every single time.

   A batch size should help to reduce the overhead and it should also
   help to reduce the wakeups of the user-space scheduler.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-21 18:53:30 +01:00
Tejun Heo
cfb41a77fc
Merge pull request #43 from sched-ext/ubuntu_2204_ci
ci: Run CI job on Ubuntu 22.04
2023-12-21 06:55:45 +09:00
David Vernet
1bf04d0972
ci: Run CI job on Ubuntu 22.04
Andrea pointed out that we can and should be using Ubuntu 22.04.
Unfortunately it still doesn't ship some of the deps we need like
clang-17, but it does at least ship virtme-ng, so it's good for us to
use this so that we can actually test running the schedulers in a
virtme-ng VM when it supports being run in docker.

Also, update the job to run on pushes, and not just when a PR is opened

Suggested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David Vernet <void@manifault.com>
2023-12-20 15:29:49 -06:00
Tejun Heo
79b0c3ea89
Merge pull request #41 from multics69/link-blogs
Update README for additional resources (blog posts and articles)
2023-12-19 16:57:32 +09:00
Changwoo Min
23cecf2532 Update README for additional resources (blog posts and articles) 2023-12-19 12:28:44 +09:00
David Vernet
eb7b3c99f0
Merge pull request #40 from sched-ext/ci
scx: Add CI action that builds schedulers for PRs
2023-12-18 21:17:47 -06:00
David Vernet
4523b10e45
scx: Add CI action that builds schedulers for PRs
When Ubuntu ships with sched_ext, we can also maybe test loading the
schedulers (not sure if the runners can run as root though). For now, we
should at least have a CI job that lets us verify that the schedulers
can _build_. To that end, this patch adds a basic CI action that builds
the schedulers.

As is, this is a bit brittle in that we're having to manually download
and install a few dependencies. I don't see a better way for now without
hosting our own runners with our own containers, but that's a bigger
investment. For now, hopefully this will get us _some_ coverage.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-18 21:12:50 -06:00
Tejun Heo
3049d60883
Merge pull request #39 from sched-ext/nest_fixes
Fix some things in Nest
2023-12-18 13:15:41 -10:00
David Vernet
318c06fa9c
nest: Skip out of idle cpu selection on exec() path
The core sched code calls select_task_rq() in a few places: the task
wakeup path (typical path), the fork() path, and the exec() path. For
nest scheduling, we don't want to select a core from the nest on the
exec() path. If we were previously able to find an idle core, we would
have found it on the fork() path, so we don't gain much by checking on
the exec() path. In fact, it's actually harmful, because we could
incorrectly blow up the primary nest unnecessarily by bumping the same
task between multiple cores for no reason. Let's just opt-out of
select_task_rq calls on the exec() path.

Suggested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: David Vernet <void@manifault.com>
2023-12-18 13:51:15 -06:00
David Vernet
ab0e36f9ce
scx_nest: Apply r_impatient if no task is found in primary nest
Julia pointed out that our current implementation of r_impatient is
incorrect. r_impatient is meant to be a mechanism for more aggressively
growing the primary nest if a task repeatedly isn't able to find a core.
Right now, we trigger r_impatient if we're not able to find an attached
or previous core in the primary nest, but we _should_ be triggering it
only if we're unable to find _any_ core in the primary nest. Fixing the
implementation to do this drastically decreases how aggressively we grow
the primary nest when r_impatient is in effect.

Reported-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: David Vernet <void@manifault.com>
2023-12-18 11:05:36 -06:00
Tejun Heo
239d5d1d2c
Merge pull request #37 from jordalgo/folder-restructure
Restructure scheds folder names
2023-12-17 15:15:54 -10:00
Jordan Rome
e9a9d32ab6 Restructure scheds folder names
- combine c and kernel-examples as it's confusing to have both
- rename 'rust-user' and 'c-user' to just 'rust' and 'c', which is simpler
- update and fix sync-to-kernel.sh
2023-12-17 13:14:31 -08:00
Tejun Heo
52381a9764
Merge pull request #36 from danielocfb/topic/libbpf-rs-update
rust: Update libbpf-rs & libbpf-cargo to 0.22
2023-12-14 12:41:20 -10:00
Daniel Müller
fed1dae9da rust: Update libbpf-rs & libbpf-cargo to 0.22
This is a follow on to #32, which got reverted. I wrongly assumed that
scx_rusty resides in the sched_ext tree and consumes published version
of scx_utils.
With this change we update the other in-tree dependencies. I built
scx_layered & scx_rusty. I bumped scx-utils to 0.4, because the
libbpf-cargo seems to be part of the public API surface and libbpf-cargo
0.21 and 0.22 are not compatible with each other.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-12-14 14:33:58 -08:00
Tejun Heo
185ad61ad8
Merge pull request #34 from jordalgo/readme-meson-build
Update meson install in readme
2023-12-14 10:34:43 -10:00
Tejun Heo
300bd192b5
Merge pull request #35 from sched-ext/htejun
Revert "scx_utils: Update libbpf-cargo to 0.22"
2023-12-14 10:33:56 -10:00
Tejun Heo
995520e415 Revert "scx_utils: Update libbpf-cargo to 0.22"
This reverts commit 6d8ba3e3d7.

Piotr Gorski reports build breakage after the commit:

  https://paste.cachyos.org/p/aa14709.txt
2023-12-14 10:32:25 -10:00
Jordan Rome
8b02b41f51 Update meson install in readme 2023-12-14 12:26:51 -08:00
Tejun Heo
4f5ca907ac
Merge pull request #32 from danielocfb/topic/libbpf-cargo-0.22
scx_utils: Update libbpf-cargo to 0.22
2023-12-14 09:28:47 -10:00
Daniel Müller
6d8ba3e3d7 scx_utils: Update libbpf-cargo to 0.22
It's the latest and greatest. Given the repository setup, this is likely
a breaking change from a semver perspective.
2023-12-14 11:18:27 -08:00
David Vernet
b8f70fa09a
Merge pull request #31 from jordalgo/minor-rusty-refactor
minor refactor of scx_rusty
2023-12-14 09:35:18 -06:00
Jordan Rome
ba35b97bb7 minor refactor of scx_rusty 2023-12-14 07:33:53 -08:00
David Vernet
f7c152a6b1
Merge pull request #30 from arighi/scx-userland-improve-robustness
make scx_userland more robust
2023-12-13 15:42:17 -06:00
Andrea Righi
dc81311d79 scx_userland: align MAX_ENQUEUED_TASKS to dispatch batch
With commit 48bba8e ("scx_userland: survive to dispatch failures")
scx_useland can better tolerate dispatch failures, so we can reduce a
bit MAX_ENQUEUED_TASKS and align it with the size used in bpf_repeat(),
when tasks are actually dispatched in the bpf counterpart.

This allows reducing the memory footprint of the scheduler and makes it
more consistent between enqueue and dispatch events.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-13 22:25:04 +01:00
Andrea Righi
48bba8e4f6 scx_userland: survive to dispatch failures
If the scheduler fails to dispatch a task we immediately give up,
exiting with an error like the following:

 Failed to dispatch task 251 in 1
 EXIT: BPF scheduler unregistered

This scenario can be simulated decreasing dramatically the value of
MAX_ENQUEUED_TASKS.

We can make the scheduler a little more robust simply by re-adding the
task that cannot be dispatched to vruntime_head and stop dispatching
additional tasks in the same batch.

This can give enough room, under such "dispatch overload" condition, to
catch up and resume the normal execution without crashing.

Moreover, introduce nr_vruntime_failed to report failed dispatch events
in the scheduler's statistics.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-13 22:19:36 +01:00
David Vernet
a68885f92f
Merge pull request #29 from arighi/scx-userland-pid-max
scx_userland: allocate tasks array based on kernel.pid_max
2023-12-13 10:57:03 -06:00
Andrea Righi
1e9e6778bc scx_userland: allocate tasks array based on kernel.pid_max
Currently the array of enqueued tasks is statically allocated to a fixed
size of USERLAND_MAX_TASKS to avoid potential deadlocks that could be
introduced by performing dynamic allocations in the enqueue path.

However, this also adds a limit on the maximum pid that the scheduler
can handle, since the pid is used as the index to access the array.

In fact, it is quite easy to trigger the following failure on an average
desktop system (making this scheduler pretty much unusable in such
scenario):

 $ sudo scx_userland
 ...
 Failed to enqueue task 33258: No such file or directory
 EXIT: BPF scheduler unregistered

Prevent this by using sysctl's kernel.pid_max as the size of the tasks
array (and still allocate it all at once during initialization).

The downside of this change is that scx_userland may require additional
memory to start and in small systems it could even trigger OOMs. For
this reason add an explicit message to the command help, suggesting to
reduce kernel.pid_max in case of OOM conditions.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-13 17:33:10 +01:00
Tejun Heo
ca05e28dde
Merge pull request #28 from sched-ext/htejun
Bump versions and add LICENSE symlinks for scx_layered and scx_rusty
2023-12-12 11:22:13 -10:00
Tejun Heo
8a07bcc31b Bump versions and add LICENSE symlinks for scx_layered and scx_rusty 2023-12-12 11:21:08 -10:00
Tejun Heo
35fdfdffb0
Merge pull request #27 from davide125/license
rust: clarify license and include text
2023-12-12 11:14:50 -10:00
Davide Cavalca
21e468a491 rust: clarify license and include text 2023-12-12 13:02:13 -08:00
Tejun Heo
fbb0164454
Merge pull request #26 from kkdwivedi/central-fix-nr-slots
scx_central: Break dispatch_to_cpu loop when running out of buffer slots
2023-12-11 21:57:30 -10:00
Kumar Kartikeya Dwivedi
c4c994c9ce
scx_central: Break dispatch_to_cpu loop when running out of buffer slots
For the case where many tasks being popped from the central queue cannot
be dispatched to the local DSQ of the target CPU, we will keep bouncing
them to the fallback DSQ and continue the dispatch_to_cpu loop until we
find one which can be dispatch to the local DSQ of the target CPU.

In a contrived case, it might be so that all tasks pin themselves to
CPUs != target CPU, and due to their affinity cannot be dispatched to
that CPU's local DSQ. If all of them are filling up the central queue,
then we will keep looping in the dispatch_to_cpu loop and eventually run
out of slots for the dispatch buffer. The nr_mismatched counter will
quickly rise and sched-ext will notice the error and unload the BPF
scheduler.

To remedy this, ensure that we break the dispatch_to_cpu loop when we
can no longer perform a dispatch operation. The outer loop in
central_dispatch for the central CPU should ensure the loop breaks when
we run out of these slots and schedule a self-IPI to the central core,
and allow sched-ext to consume the dispatch buffer before restarting the
dispatch loop again.

A basic way to reproduce this scenario is to do:
taskset -c 0 perf bench sched messaging

The error in the kernel will be:
sched_ext: BPF scheduler "central" errored, disabling
sched_ext: runtime error (dispatch buffer overflow)
bpf_prog_6a473147db3cec67_dispatch_to_cpu+0xc2/0x19a
bpf_prog_c9e51ba75372a829_central_dispatch+0x103/0x1a5

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2023-12-12 07:50:46 +00:00
Tejun Heo
737aa810b1
Merge pull request #25 from jordalgo/readme-ubuntu
Update README for ubuntu instructions
2023-12-11 16:05:25 -10:00
Jordan Rome
df48390d73 Update README for ubuntu instructions 2023-12-11 17:42:08 -08:00
Tejun Heo
872f1d1f1e
Merge pull request #24 from sched-ext/htejun
README: Add ubuntu instructions
2023-12-11 11:04:59 -10:00
Tejun Heo
7ebb102d5a README: David's review 2023-12-11 11:04:40 -10:00
Tejun Heo
feb9018cbe README: Add ubuntu instructions 2023-12-11 10:24:10 -10:00
Tejun Heo
8ea5850967
Merge pull request #23 from arighi/fix-s390-build
build: add Debian arch name mapping for s390
2023-12-10 03:08:56 -10:00
Andrea Righi
1742daa4b5 build: add Debian arch name mapping for s390
Add the proper mapping from Debian architecture "s390x" to the kernel
architecture name "s390".

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-10 12:42:15 +01:00
Tejun Heo
c9d0cc640a
Merge pull request #22 from arighi/enable-rust-build-option
build: introduce enable_rust build option
2023-12-09 15:59:19 -10:00
Tejun Heo
abbb6a0276
Merge pull request #20 from arighi/scx-rusty-fix
scx_rusty: fix "subtract with overflow" error
2023-12-09 15:58:11 -10:00
David Vernet
ab1d894fd2
Merge pull request #21 from arighi/misc-fixes 2023-12-09 09:33:09 -06:00
Andrea Righi
6343bcf360 build: introduce enable_rust build option
Introduce an option to enable/disable the build of all the Rust
sub-projects.

This can be useful to build scx on those systems where Rust is not
fully supported (e.g., armhf).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 15:05:23 +01:00
Andrea Righi
0637b6a0b5 scx_nest: use proper format string for u64 types
This prevents some warnings when building scx_nest on 32-bit
architectures.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:50 +01:00
Andrea Righi
adc01140aa scx_qmap: use proper format string for u64 types
This prevents some warnings when building scx_qmap on 32-bit
architectures.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:44 +01:00
Andrea Righi
4df979ccb7 scx_pair: use proper format string for u64 types
This prevents some warnings when building scx_pair on 32-bit
architectures.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:38 +01:00