Most of the schedulers assume that the amount of possible CPUs in the
system represents the actual number of CPUs available.
This is not always true: some CPUs may be offline or certain CPU models
(AMD CPUs for example) may include unavailable CPUs in this number.
This can lead to sub-optimal performance or even errors in the scheduler
(see for example [1][2]).
Ideally, we need to attack this issue in a more generic way, such as
having a proper API provided by a C library, that can be used by all
schedulers and the topology Rust module (scx_utils crate).
But for now, let's try to mitigate most of the common sub-optimal cases
separately inside each scheduler.
For rustland we can apply some mitigations both in select_cpu() (for the
BPF part) and in the user-space part:
- the former is fixed in the sched-ext kernel by commit 94dc0c01b957
("scx: Use cpu_online_mask when resetting idle masks"). However,
adding an extra check `cpu < num_possible_cpus` in select_cpu(),
allows to properly support AMD CPUs, even with kernels that don't
have the cpu_online_mask fix yet (this doesn't always guarantee the
validity of cpu, but it should be enough to mitigate the majority of
the potential sub-optimal cases, without introducing any significant
overhead)
- the latter can be fixed relying on topology.span(), instead of
topology.nr_cpus(), to count the amount of available CPUs in the
system.
[1] https://github.com/sched-ext/sched_ext/issues/69
[2] https://github.com/sched-ext/scx/issues/147
Link: 94dc0c01b9
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The current topology.rs crate assumes that all cores have unique core
IDs in a system. This need not be the case, such as in certain Intel
Xeon processors which reuse core IDs in different NUMA nodes. Let's
update the crate to assume unique core IDs only per socket.
Signed-off-by: David Vernet <void@manifault.com>
Provide distinct methods to set the target CPU and the per-task time
slice to dispatched tasks.
Moreover, also provide a constructor to create a DispatchedTask from a
QueuedTask (this allows to automatically bounce a task from the
scheduler to the BPF dispatcher without having to take care of setting
the individual task's attributes).
This also allows to make most of the attributes of DispatchedTask
private, especially it allows to hide cpumask_cnt, that should be only
used internally between the BPF and the user-space component.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Provide a way to set a different time slice per-task, by adding a new
attribute slice_ns to the DispatchedTask struct.
This attribute determines the time slice assigned to the task, if it is
set to 0 then the global time slice (either the default one or the
effective one, if set) will be used.
At the same time, remove the payload attribute, that is basically unused
(scx_rustland uses it to send the task's vruntime to the BPF dispatcher
for debugging purposes, but it's not very useful anymore at this point).
In the future we may introduce a proper interface to attach a custom
payload to each task with a proper interface.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
This is to potentinally reduce issues with folks
using different versions of libbpf at runtime.
This also:
- makes static linking of libbpf the default
- adds steps in `meson setup` to fetch libbpf and make it
There is no need to generate source code in a temporary directory with
RustLandBuilder(), we can simply generate code in-tree and exclude the
generated source files from .gitignore.
Having the generated source files in-tree can help to debug potential
build issues (and it also allows to drop the the tempfile crate
dependency).
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a Builder() class in scx_utils that can be used by other scx
crates (such as scx_rustland_core) to prevent code duplication.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a wrapper to scx_utils::BpfBuilder that can be used to build
the BPF component provided by scx_rustland_core.
The source of the BPF components (main.bpf.c) is included in the crate
as an array of bytes, the content is then unpacked in a temporary file
to perform the build.
The RustLandBuilder() helper is also used to generate bpf.rs (that
implements the low-level user-space Rust connector to the BPF
commponent).
Schedulers based on scx_rustland_core can simply use RustLandBuilder(),
to build the backend provided by scx_rustland_core.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a helper function to update the counter of queued and
scheduled tasks (used to notify the BPF component if the user-space
scheduler has still some pending work to do).
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Move the BPF component of scx_rustland to scx_rustland_core and make it
available to other user-space schedulers.
NOTE: main.bpf.c and bpf.rs are not pre-compiled in the
scx_rustland_core crate, they need to be included in the user-space
scheduler's source code in order to be compiled/linked properly.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a separate crate (scx_rustland_core) that can be used to
implement sched-ext schedulers in Rust that run in user-space.
This commit only provides the basic layout for the new crate and the
abstraction to the custom allocator.
In general, any scheduler that has a user-space component needs to use
the custom allocator to prevent potential deadlock conditions, caused by
page faults (a kthread needs to run to resolve the page fault, but the
scheduler is blocked waiting for the user-space page fault to be
resolved => deadlock).
However, we don't want to necessarily enforce this constraint to all the
existing Rust schedulers, some of them may do all user-space allocations
in safe paths, hence the separate scx_rustland_core crate.
Merging this code in scx_utils would force all the Rust schedulers to
use the custom allocator.
In a future commit the scx_rustland backend will be moved to
scx_rustland_core, making it a totally generic BPF scheduler framework
that can be used to implement user-space schedulers in Rust.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We want to avoid every scheduler implementation from having to implement
the solution to the infeasible weights problem, but we also want to
enable sufficient flexibility where not every program has to have the
same partition of scheduling domains, etc. To enable this, a new
infeasible crate is added which encapsulates all of the logic for being
given duty cycle and weight, and performing the necessary math to adjust
for infeasibility.
Signed-off-by: David Vernet <void@manifault.com>
For convenience, let's provide callers with a way to easily look up
cores and CPUs from the root topology object.
Signed-off-by: David Vernet <void@manifault.com>
The topology.rs crate is insufficiently generic, and reflects
implementation details of scx_rusty more than it provides generic use
cases for modeling a host's topology. This adds a new topology2.rs crate
that will replace topology.rs. We have this as an intermediate commit so
that we don't bundle updating scx_rusty with adding this crate.
Signed-off-by: David Vernet <void@manifault.com>
Now that we have cpumask.rs, we can remove some logic from topology.rs
and have it create and use Cpumasks.
Signed-off-by: David Vernet <void@manifault.com>
Let's add a Cpumask trait that schedulers can use to avoid all having to
deal directly with BitVec and the like.
Signed-off-by: David Vernet <void@manifault.com>
We currently panic! if we're building a Topology that detects more than
two siblings on a physical core. This can and will likely happen on
multi-socket machines. Given that we're planning to add support for
detecting NUMA nodes soon, let's just demote the panic! to a warn!.
Signed-off-by: David Vernet <void@manifault.com>
scx_rusty has logic in the scheduler to inspect the host to
automatically build scheduling domains across every L3 cache. This would
be generically useful for many different types of schedulers, so let's
add it to the scx_utils crate so it can be used by others.
Signed-off-by: David Vernet <void@manifault.com>
Use c_char to convert C strings, that is more portable across different
architectures.
This prevents a build failure on arm64 and ppc64el.
Fixes: d57a23f ("rust/scx_utils: Add user_exit_info support")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
After updates to reflect the updated init and direct dispatch API, the
schedulers aren't compatible with older kernels. Bump versions and publish
releases.
This is to fix fedora build failures for these archs:
s390x and ppc64le
Error:
```
---- bpf_builder::tests::test_bpf_builder_new stdout ----
thread 'bpf_builder::tests::test_bpf_builder_new' panicked at src/bpf_builder.rs:592:9:
Failed to create BpfBuilder (Err(CPU arch "s390x" not found in ARCH_MAP))
```
https://koji.fedoraproject.org/koji/taskinfo?taskID=111114326
- combine c and kernel-examples as it's confusing to have both
- rename 'rust-user' and 'c-user' to just 'rust' and 'c', which is simpler
- update and fix sync-to-kernel.sh
This is a follow on to #32, which got reverted. I wrongly assumed that
scx_rusty resides in the sched_ext tree and consumes published version
of scx_utils.
With this change we update the other in-tree dependencies. I built
scx_layered & scx_rusty. I bumped scx-utils to 0.4, because the
libbpf-cargo seems to be part of the public API surface and libbpf-cargo
0.21 and 0.22 are not compatible with each other.
Signed-off-by: Daniel Müller <deso@posteo.net>
Apply the same logic of commit 00cd15a ("build: properly detect clang
version in Ubuntu") in scx_utils as well.
This allows to build scx_utils properly in Ubuntu.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
bpf_cflags doesn't contain '-target bpf' because SkeletonBuilder does so
internally. However, bindgen doesn't and we end up specifying options which
are specific to bpf target without actually specifying the target, this
leads to warnings like the followings:
[scx_layered 0.0.1] warning: unknown warning option '-Wno-compare-distinct-pointer-types''; did you mean '-Wno-compare-distinct-pointer-types'? [-Wunknown-warning-option]
[scx_layered 0.0.1] clang diag: warning: argument unused during compilation: '-mcpu=v3' [-Wunused-command-line-argument]
[scx_layered 0.0.1] clang diag: warning: unknown warning option '-Wno-compare-distinct-pointer-types''; did you mean '-Wno-compare-distinct-pointer-types'? [-Wunknown-warning-option]
Fix it by adding '-target bpf' when invoking bindgen.