Commit Graph

61 Commits

Author SHA1 Message Date
Andrea Righi
2cd3929475 scx_rustland: mitigate sub-optimal performance with offline CPUs
Most of the schedulers assume that the amount of possible CPUs in the
system represents the actual number of CPUs available.

This is not always true: some CPUs may be offline or certain CPU models
(AMD CPUs for example) may include unavailable CPUs in this number.

This can lead to sub-optimal performance or even errors in the scheduler
(see for example [1][2]).

Ideally, we need to attack this issue in a more generic way, such as
having a proper API provided by a C library, that can be used by all
schedulers and the topology Rust module (scx_utils crate).

But for now, let's try to mitigate most of the common sub-optimal cases
separately inside each scheduler.

For rustland we can apply some mitigations both in select_cpu() (for the
BPF part) and in the user-space part:

 - the former is fixed in the sched-ext kernel by commit 94dc0c01b957
   ("scx: Use cpu_online_mask when resetting idle masks"). However,
   adding an extra check `cpu < num_possible_cpus` in select_cpu(),
   allows to properly support AMD CPUs, even with kernels that don't
   have the cpu_online_mask fix yet (this doesn't always guarantee the
   validity of cpu, but it should be enough to mitigate the majority of
   the potential sub-optimal cases, without introducing any significant
   overhead)

 - the latter can be fixed relying on topology.span(), instead of
   topology.nr_cpus(), to count the amount of available CPUs in the
   system.

[1] https://github.com/sched-ext/sched_ext/issues/69
[2] https://github.com/sched-ext/scx/issues/147

Link: 94dc0c01b9
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-03-14 10:19:31 +01:00
David Vernet
91cb5ce8ab
Merge pull request #178 from sched-ext/multi_numa_rusty
rusty: Implement NUMA-aware load balancing
2024-03-12 15:50:27 -05:00
Jordan Rome
ffc7b7dc4a Fetch and build bpftool by default
This pairs with the new default behavior to fetch and build libbpf
and is mostly being used so we can use the latest bpftool and libbpf.
2024-03-11 10:00:01 -07:00
David Vernet
1c3168d2a4
topology: Don't assume unique core IDs
The current topology.rs crate assumes that all cores have unique core
IDs in a system. This need not be the case, such as in certain Intel
Xeon processors which reuse core IDs in different NUMA nodes. Let's
update the crate to assume unique core IDs only per socket.

Signed-off-by: David Vernet <void@manifault.com>
2024-03-08 15:13:46 -06:00
David Vernet
12e0586fe9
cpumask: Update cpumask fmt function
The cpumask print formatter doesn't look great in its current form, which uses
the BitVec formatter under the hood:

[INFO] NUMA[00 32:<[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]>]
[INFO]         DOM[00] 32:<[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]>
[INFO]         DOM[01] 32:<[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]>

Let's just iterate over the mask and manually format the string using the
binary formatter over the slice of u64's, which renders like this:

[INFO] NUMA[00] 0b11111111111111111111111111111111]
[INFO]         DOM[00] 0b00000000111111110000000011111111
[INFO]         DOM[01] 0b11111111000000001111111100000000

Signed-off-by: David Vernet <void@manifault.com>
2024-03-08 15:11:17 -06:00
Andrea Righi
5cf113f058 scx_rustland_core: provide DispatchedTask API methods
Provide distinct methods to set the target CPU and the per-task time
slice to dispatched tasks.

Moreover, also provide a constructor to create a DispatchedTask from a
QueuedTask (this allows to automatically bounce a task from the
scheduler to the BPF dispatcher without having to take care of setting
the individual task's attributes).

This also allows to make most of the attributes of DispatchedTask
private, especially it allows to hide cpumask_cnt, that should be only
used internally between the BPF and the user-space component.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-03-03 15:49:37 +01:00
Andrea Righi
e10f8a2d8e scx_rustland_core: introduce per-task time slice
Provide a way to set a different time slice per-task, by adding a new
attribute slice_ns to the DispatchedTask struct.

This attribute determines the time slice assigned to the task, if it is
set to 0 then the global time slice (either the default one or the
effective one, if set) will be used.

At the same time, remove the payload attribute, that is basically unused
(scx_rustland uses it to send the task's vruntime to the BPF dispatcher
for debugging purposes, but it's not very useful anymore at this point).

In the future we may introduce a proper interface to attach a custom
payload to each task with a proper interface.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-03-03 15:06:56 +01:00
Jordan Rome
499924ead8 Add libbpf as a submodule
This is to potentinally reduce issues with folks
using different versions of libbpf at runtime.

This also:
- makes static linking of libbpf the default
- adds steps in `meson setup` to fetch libbpf and make it
2024-03-01 12:39:35 -08:00
Andrea Righi
0d1c6555a4 scx_rustland_core: generate source files in-tree
There is no need to generate source code in a temporary directory with
RustLandBuilder(), we can simply generate code in-tree and exclude the
generated source files from .gitignore.

Having the generated source files in-tree can help to debug potential
build issues (and it also allows to drop the the tempfile crate
dependency).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
Andrea Righi
06d8170f9f scx_utils: introduce Builder()
Introduce a Builder() class in scx_utils that can be used by other scx
crates (such as scx_rustland_core) to prevent code duplication.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
Andrea Righi
2ac1a5924f scx_rustland_core: introduce RustLandBuilder()
Introduce a wrapper to scx_utils::BpfBuilder that can be used to build
the BPF component provided by scx_rustland_core.

The source of the BPF components (main.bpf.c) is included in the crate
as an array of bytes, the content is then unpacked in a temporary file
to perform the build.

The RustLandBuilder() helper is also used to generate bpf.rs (that
implements the low-level user-space Rust connector to the BPF
commponent).

Schedulers based on scx_rustland_core can simply use RustLandBuilder(),
to build the backend provided by scx_rustland_core.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
Andrea Righi
e23426e299 scx_rustland_core: introduce method bpf.update_tasks()
Introduce a helper function to update the counter of queued and
scheduled tasks (used to notify the BPF component if the user-space
scheduler has still some pending work to do).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
Andrea Righi
00e25530bc scx_rlfifo: simple user-space FIFO scheduler written in Rust
Implement a FIFO scheduler as an example usage of scx_rustland_core.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
Andrea Righi
871a6c10f9 scx_rustland_core: include scx_rustland backend
Move the BPF component of scx_rustland to scx_rustland_core and make it
available to other user-space schedulers.

NOTE: main.bpf.c and bpf.rs are not pre-compiled in the
scx_rustland_core crate, they need to be included in the user-space
scheduler's source code in order to be compiled/linked properly.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
Andrea Righi
416d6a940f rust: introduce scx_rustland_core crate
Introduce a separate crate (scx_rustland_core) that can be used to
implement sched-ext schedulers in Rust that run in user-space.

This commit only provides the basic layout for the new crate and the
abstraction to the custom allocator.

In general, any scheduler that has a user-space component needs to use
the custom allocator to prevent potential deadlock conditions, caused by
page faults (a kthread needs to run to resolve the page fault, but the
scheduler is blocked waiting for the user-space page fault to be
resolved => deadlock).

However, we don't want to necessarily enforce this constraint to all the
existing Rust schedulers, some of them may do all user-space allocations
in safe paths, hence the separate scx_rustland_core crate.

Merging this code in scx_utils would force all the Rust schedulers to
use the custom allocator.

In a future commit the scx_rustland backend will be moved to
scx_rustland_core, making it a totally generic BPF scheduler framework
that can be used to implement user-space schedulers in Rust.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-28 17:49:44 +01:00
David Vernet
b1dc37889f
infeasible: Add a new infeasible crate for load balancing
We want to avoid every scheduler implementation from having to implement
the solution to the infeasible weights problem, but we also want to
enable sufficient flexibility where not every program has to have the
same partition of scheduling domains, etc. To enable this, a new
infeasible crate is added which encapsulates all of the logic for being
given duty cycle and weight, and performing the necessary math to adjust
for infeasibility.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-26 10:51:52 -06:00
David Vernet
f85cde7038
topology: Add maps to cores and cpus from root Topology object
For convenience, let's provide callers with a way to easily look up
cores and CPUs from the root topology object.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-23 13:09:02 -06:00
David Vernet
43624a87ce
rusty: Use new topology crate
Now that we have this new Topology crate, let's update Rusty to use it
instead of using the old one.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-23 10:39:55 -06:00
David Vernet
c5a3b83bbd
topology: Add new topology crate
The topology.rs crate is insufficiently generic, and reflects
implementation details of scx_rusty more than it provides generic use
cases for modeling a host's topology. This adds a new topology2.rs crate
that will replace topology.rs. We have this as an intermediate commit so
that we don't bundle updating scx_rusty with adding this crate.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-23 10:39:00 -06:00
David Vernet
a2f531e429
topology: Refactor topology.rs to use cpumask crate
Now that we have cpumask.rs, we can remove some logic from topology.rs
and have it create and use Cpumasks.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-22 17:25:44 -06:00
David Vernet
bd15eb8e41
cpumask: Add new Cpumask crate
Let's add a Cpumask trait that schedulers can use to avoid all having to
deal directly with BitVec and the like.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-22 17:24:26 -06:00
David Vernet
49065de8df
scx: Demote panic! to warn! in topology crate
We currently panic! if we're building a Topology that detects more than
two siblings on a physical core. This can and will likely happen on
multi-socket machines. Given that we're planning to add support for
detecting NUMA nodes soon, let's just demote the panic! to a warn!.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-20 21:01:17 -06:00
Jordan Rome
7c32acece0 Add libbpf logging to the rust schedulers
This is to get better logs when failing to load, attach, etc.
2024-02-20 15:17:10 -08:00
David Vernet
ef8aa9ea31
add documentation
Signed-off-by: David Vernet <void@manifault.com>
2024-02-20 14:57:09 -06:00
David Vernet
8aba090d4f
rust: Add topology module to utils crate
scx_rusty has logic in the scheduler to inspect the host to
automatically build scheduling domains across every L3 cache. This would
be generically useful for many different types of schedulers, so let's
add it to the scx_utils crate so it can be used by others.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-20 14:57:09 -06:00
Andrea Righi
f5a21198ad scx_utils: use c_char to prevent build failures
Use c_char to convert C strings, that is more portable across different
architectures.

This prevents a build failure on arm64 and ppc64el.

Fixes: d57a23f ("rust/scx_utils: Add user_exit_info support")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-02-11 21:42:52 +01:00
Tejun Heo
d57a23f481 rust/scx_utils: Add user_exit_info support
So that rust scheds can use the same unified exit handling as C scheds.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-01-31 10:54:15 -10:00
Tejun Heo
988b7d13c1 Bump versions
scx_exit_info change doesn't require code to be updated but breaks binary
compatbility. Bump versions and cut a new release.
2024-01-25 09:01:23 -10:00
Tejun Heo
942b0269b8 Bump versions
After updates to reflect the updated init and direct dispatch API, the
schedulers aren't compatible with older kernels. Bump versions and publish
releases.
2024-01-08 18:49:54 -10:00
Jordan Rome
6caf6c5c99 Add new archs for bpf_builder
This is to fix fedora build failures for these archs:
s390x and ppc64le

Error:
```
---- bpf_builder::tests::test_bpf_builder_new stdout ----
thread 'bpf_builder::tests::test_bpf_builder_new' panicked at src/bpf_builder.rs:592:9:
Failed to create BpfBuilder (Err(CPU arch "s390x" not found in ARCH_MAP))
```

https://koji.fedoraproject.org/koji/taskinfo?taskID=111114326
2024-01-03 10:50:33 -08:00
Tejun Heo
98773131df Bump versions to publish scx_utils fedora compat change 2023-12-29 06:58:45 +09:00
Tejun Heo
c47a4b6716 scx_utils: Explain what's going on with bindgen version and suppress deprecation warning
This is a followup to https://github.com/sched-ext/scx/pull/50. See the
comment in BpfBuilder::bindgen_bpf_intf() for details.
2023-12-29 06:56:07 +09:00
Jordan Rome
c8a721b033 Downgrade bindgen to 0.68
This is so we can package scx_utils into fedora without having
to upgrade rust-bindgen
(https://bodhi.fedoraproject.org/updates/FEDORA-2023-18e7f124e1).

To make this happen we need to stop using the `CargoCallbacks::new`
constructor which was added in 0.69. Old way seems legit according
to the docs:
https://rust-lang.github.io/rust-bindgen/non-system-libraries.html
2023-12-27 12:19:28 -08:00
Jordan Rome
e9a9d32ab6 Restructure scheds folder names
- combine c and kernel-examples as it's confusing to have both
- rename 'rust-user' and 'c-user' to just 'rust' and 'c', which is simpler
- update and fix sync-to-kernel.sh
2023-12-17 13:14:31 -08:00
Daniel Müller
fed1dae9da rust: Update libbpf-rs & libbpf-cargo to 0.22
This is a follow on to #32, which got reverted. I wrongly assumed that
scx_rusty resides in the sched_ext tree and consumes published version
of scx_utils.
With this change we update the other in-tree dependencies. I built
scx_layered & scx_rusty. I bumped scx-utils to 0.4, because the
libbpf-cargo seems to be part of the public API surface and libbpf-cargo
0.21 and 0.22 are not compatible with each other.

Signed-off-by: Daniel Müller <deso@posteo.net>
2023-12-14 14:33:58 -08:00
Tejun Heo
995520e415 Revert "scx_utils: Update libbpf-cargo to 0.22"
This reverts commit 6d8ba3e3d7.

Piotr Gorski reports build breakage after the commit:

  https://paste.cachyos.org/p/aa14709.txt
2023-12-14 10:32:25 -10:00
Daniel Müller
6d8ba3e3d7 scx_utils: Update libbpf-cargo to 0.22
It's the latest and greatest. Given the repository setup, this is likely
a breaking change from a semver perspective.
2023-12-14 11:18:27 -08:00
Tejun Heo
8a07bcc31b Bump versions and add LICENSE symlinks for scx_layered and scx_rusty 2023-12-12 11:21:08 -10:00
Davide Cavalca
21e468a491 rust: clarify license and include text 2023-12-12 13:02:13 -08:00
Tejun Heo
3acdef8f2b scx_utils: Bump version to 0.3.2
- Build fix for ubuntu clang.
2023-12-07 13:08:21 -10:00
Andrea Righi
7ceccf6516 scx_utils::BpfBuilder: properly detect clang version in Ubuntu
Apply the same logic of commit 00cd15a ("build: properly detect clang
version in Ubuntu") in scx_utils as well.

This allows to build scx_utils properly in Ubuntu.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-07 23:26:21 +01:00
Tejun Heo
fa532a87f5 scx_utils: Bump version & publish
So that it doesn't cause the missing lifetime parameter warnings everywhere.
2023-12-05 14:58:51 -10:00
Tejun Heo
5422584baa scx_utils::BpfBuilder: Add 'static to BPF_H_TAR definition
Newer rustc is unhappy about lack of explicit lifetime parameter.
2023-12-05 14:15:06 -10:00
Tejun Heo
fcab460386 scx_utils: Bump version and publish
include directory structure has changed (a breaking change) and the doc had
a misleading error. Let's cut a new release.
2023-12-03 12:51:16 -10:00
Tejun Heo
d0ed7913b4 scheds: Rearrange include files to match kernel/tools/sched_ext/include
Build scripts are updated accordingly.
2023-12-03 12:47:23 -10:00
Tejun Heo
44b811831a doc: README.md and OVERVIEW.md added and other minor updates 2023-12-03 11:48:55 -10:00
Tejun Heo
12a90335b3 scx_utils: Bump version to 0.2.1 2023-12-02 23:56:51 -10:00
Tejun Heo
b38f7574ac scx_utils: Documentation and other minor updates 2023-12-02 23:56:36 -10:00
Tejun Heo
41a4f6407e build: Add dummy gnu/stubs.h to fix build on systems without 32bit glibc-dev
See comment in sched/include/bpf-compat/gnu/stubs.h for details.
2023-12-02 13:25:02 -10:00
Tejun Heo
2af20dead9 scx_utils::BpfBuilder: Add '-target bpf' to clang_args when calling bindgen
bpf_cflags doesn't contain '-target bpf' because SkeletonBuilder does so
internally. However, bindgen doesn't and we end up specifying options which
are specific to bpf target without actually specifying the target, this
leads to warnings like the followings:

  [scx_layered 0.0.1] warning: unknown warning option '-Wno-compare-distinct-pointer-types''; did you mean '-Wno-compare-distinct-pointer-types'? [-Wunknown-warning-option]
  [scx_layered 0.0.1] clang diag: warning: argument unused during compilation: '-mcpu=v3' [-Wunused-command-line-argument]
  [scx_layered 0.0.1] clang diag: warning: unknown warning option '-Wno-compare-distinct-pointer-types''; did you mean '-Wno-compare-distinct-pointer-types'? [-Wunknown-warning-option]

Fix it by adding '-target bpf' when invoking bindgen.
2023-12-02 07:23:27 -10:00