C SCX_OPS_ATTACH() and rust scx_ops_attach() macros were not calling
.attach() and were only attaching the struct_ops. This meant that all
non-struct_ops BPF programs contained in the skels were never attached which
breaks e.g. scx_layered.
Let's fix it by adding .attach() invocation the the attach macros.
In order to make it easy for schedulers to use the hotplug_seq feature that's
available in recent kernels, we'll need to provide a macro wrapper so that we
can support the feature with backwards compatibility. This adds scx_ops_open!()
to abstract that. Any scheduler that uses scx_ops_open()! will be exited if a
hotplug event happens between opening the skeleton, and loading it.
Signed-off-by: David Vernet <void@manifault.com>
Now that the kernel exports the SCX_ECODE_ACT_RESTART exit code, we can
remove the custom hotplug logic from scx_rusty, and instead rely on the
built-in logic from the kernel. There's still a corner case that we're not
honoring: when a hotplug event happens on the init path. A future change will
address this as well.
Signed-off-by: David Vernet <void@manifault.com>
Schedulers and the kernel can include an exit code when exiting a scheduler.
There are some built-in codes that can be specified: SCX_ECODE_RSN_HOTPLUG,
and SCX_ECODE_ACT_RESTART. Some schedulers may want to check the exit
code against these values, so let's export them from user_exit_info.rs.
We use lazy_static so that we can read the values for the enum for the
currently-running kernel.
Signed-off-by: David Vernet <void@manifault.com>
Some users are running with NUMA disabled, which makes sense given that it's
useless in a lot of contexts. Let's make the Topology crate assume a default
node with ID 0 in such cases.
Signed-off-by: David Vernet <void@manifault.com>
Add a method to TopologyMap to get the amount of online CPUs.
Considering that most of the schedulers are not handling CPU hotplugging
it can be useful to expose also this metric in addition to the amount of
available CPUs in the system.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If another scheduler is already running, the Rust schedulers based on
scx_utils are reporting an error like the following, that can be a bit
difficult to understand:
Error: Failed to attach struct ops
Caused by:
bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL
Change the scx_ops_attach macro to check if another sched_ext scheduler
is running and in that case report a more explicit error.
With this applied:
$ sudo scx_rustland
Error: another sched_ext scheduler is already running
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
It would be useful to be able to check whether a kfunc exists from rust
user space. For example, now that we have support for adjusting cpufreq
in scx_layered, we'll want to be able to test whether or not a the
scx_bpf_cpuperf_set() (and friends) kfuncs are present for backwards
compat purposes. Let's add a kfunc_exists() function to compat.rs for
this purpose.
Signed-off-by: David Vernet <void@manifault.com>
Introuce a TopologyMap object, represented as an array of arrays, where
each inner array corresponds to a core containing its associated CPU
IDs.
This object can be used as a cache to facilitate efficient iteration
over the entire host's topology.
Example usage:
let topo = Topology::new()?;
let topo_map = TopologyMap::new(topo)?;
for (core_id, core) in topo_map.iter().enumerate() {
for cpu in core {
println!("core={} cpu={}", core_id, cpu);
}
}
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We're currently cloning cpumasks returned by calls to {Core, Cache,
Node, Topology}::span(). If a caller needs to clone it, they can. Let's
not penalize the callers that just want to query the underlying cpumask.
Signed-off-by: David Vernet <void@manifault.com>
Newer kernels also support exiting gracefully with an exit code. Let's
update the UserExitInfo struct to also read and export this value.
Signed-off-by: David Vernet <void@manifault.com>
As described in https://github.com/sched-ext/scx/issues/195, apparently
some chips don't export information about their cache topology. There's
not much we can do if we don't have that information, so let's just
assume a unified cache per node if that happens.
Andrea suggested this patch -- I'm applying exactly what he proposed,
with a slightly modified comment.
Suggested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David Vernet <void@manifault.com>
As described in https://bugzilla.kernel.org/show_bug.cgi?id=218109,
https://github.com/sched-ext/scx/issues/147 and
https://github.com/sched-ext/sched_ext/issues/69, AMD chips can
sometimes report fully disabled CPUs as offline, which causes us to
count them when looking at /sys/devices/system/cpu/possible.
Additionally, systems can have holes in their active CPU maps. For
example, a system with CPUs 0, 1, 2, 3 possible, may have only 0 and 2
active. To address this, we need to do a few things:
1. Update topology.rs to be clear that it's returning the number of
_possible_ CPUs in the system. Also update Topology to only record
online CPUs when creating its span and iterating over sysfs when
creating domains. It was previously trying to record when a CPU was
online, but this was actually broken as the topology directory isn't
present in sysfs when the CPU is offline.
2. Schedulers should not be relying on nr_possible_cpus for anything
other than interacting with per-CPU data (e.g. for stats extraction),
or e.g. verifying maximum sizes of statically sized arrays in BPF. It
should _not_ be used for e.g. performing load calculations, etc. With
that said, we'll also need to update schedulers to not rely on the
nr_possible_cpus figure being exported by the topology crate. We do
that for rusty in this patch, but don't fix any of the others other
than updating how they call topology.rs.
3. Account for the fact that LLC IDs may be non-contiguous. For example,
if there is a single core in an LLC, then if we assign LLC IDs to
domains, then the domain IDs won't be contiguous. This doesn't fit
our current model which is used by e.g. infeasible_weights.rs. We'll
update some of the code in rusty to accomodate this, but we'll need
to do more.
4. Update schedulers to properly reset themselves in the event of a
hotplug event. We'll take care of that in a follow-on change.
Signed-off-by: David Vernet <void@manifault.com>
We implement functions or(), and(), and xor() for cpumasks, but we
should also implement the bitwise ops for those operations in case
people prefer that syntax.
Signed-off-by: David Vernet <void@manifault.com>
Offline CPUs don't have a /sys/devices/system/cpu/cpuN/topology
directory, so let's just skip them if they're not online. Schedulers are
expected to detect hotplug, and handle gracefully restarting.
Signed-off-by: David Vernet <void@manifault.com>
We're iterating from min..max cpu in cpus_online(), but that's not
inclusive of the max CPU. Let's also include that so we don't think that
last CPU is offline.
Signed-off-by: David Vernet <void@manifault.com>
We are failing to parse /sys/devices/system/cpu/online in systems with
just one CPU, for example:
$ vng -r --cpus 1 -- scx_rusty
Error: Failed to parse online cpus 0
Correctly handle strings containing only a single CPU during parsing.
Fixes: c5a3b83b ("topology: Add new topology crate")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The current topology.rs crate assumes that all cores have unique core
IDs in a system. This need not be the case, such as in certain Intel
Xeon processors which reuse core IDs in different NUMA nodes. Let's
update the crate to assume unique core IDs only per socket.
Signed-off-by: David Vernet <void@manifault.com>
This is to potentinally reduce issues with folks
using different versions of libbpf at runtime.
This also:
- makes static linking of libbpf the default
- adds steps in `meson setup` to fetch libbpf and make it
Introduce a Builder() class in scx_utils that can be used by other scx
crates (such as scx_rustland_core) to prevent code duplication.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Introduce a separate crate (scx_rustland_core) that can be used to
implement sched-ext schedulers in Rust that run in user-space.
This commit only provides the basic layout for the new crate and the
abstraction to the custom allocator.
In general, any scheduler that has a user-space component needs to use
the custom allocator to prevent potential deadlock conditions, caused by
page faults (a kthread needs to run to resolve the page fault, but the
scheduler is blocked waiting for the user-space page fault to be
resolved => deadlock).
However, we don't want to necessarily enforce this constraint to all the
existing Rust schedulers, some of them may do all user-space allocations
in safe paths, hence the separate scx_rustland_core crate.
Merging this code in scx_utils would force all the Rust schedulers to
use the custom allocator.
In a future commit the scx_rustland backend will be moved to
scx_rustland_core, making it a totally generic BPF scheduler framework
that can be used to implement user-space schedulers in Rust.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We want to avoid every scheduler implementation from having to implement
the solution to the infeasible weights problem, but we also want to
enable sufficient flexibility where not every program has to have the
same partition of scheduling domains, etc. To enable this, a new
infeasible crate is added which encapsulates all of the logic for being
given duty cycle and weight, and performing the necessary math to adjust
for infeasibility.
Signed-off-by: David Vernet <void@manifault.com>
For convenience, let's provide callers with a way to easily look up
cores and CPUs from the root topology object.
Signed-off-by: David Vernet <void@manifault.com>
The topology.rs crate is insufficiently generic, and reflects
implementation details of scx_rusty more than it provides generic use
cases for modeling a host's topology. This adds a new topology2.rs crate
that will replace topology.rs. We have this as an intermediate commit so
that we don't bundle updating scx_rusty with adding this crate.
Signed-off-by: David Vernet <void@manifault.com>
Now that we have cpumask.rs, we can remove some logic from topology.rs
and have it create and use Cpumasks.
Signed-off-by: David Vernet <void@manifault.com>
Let's add a Cpumask trait that schedulers can use to avoid all having to
deal directly with BitVec and the like.
Signed-off-by: David Vernet <void@manifault.com>
We currently panic! if we're building a Topology that detects more than
two siblings on a physical core. This can and will likely happen on
multi-socket machines. Given that we're planning to add support for
detecting NUMA nodes soon, let's just demote the panic! to a warn!.
Signed-off-by: David Vernet <void@manifault.com>
scx_rusty has logic in the scheduler to inspect the host to
automatically build scheduling domains across every L3 cache. This would
be generically useful for many different types of schedulers, so let's
add it to the scx_utils crate so it can be used by others.
Signed-off-by: David Vernet <void@manifault.com>
Use c_char to convert C strings, that is more portable across different
architectures.
This prevents a build failure on arm64 and ppc64el.
Fixes: d57a23f ("rust/scx_utils: Add user_exit_info support")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
After updates to reflect the updated init and direct dispatch API, the
schedulers aren't compatible with older kernels. Bump versions and publish
releases.
This is to fix fedora build failures for these archs:
s390x and ppc64le
Error:
```
---- bpf_builder::tests::test_bpf_builder_new stdout ----
thread 'bpf_builder::tests::test_bpf_builder_new' panicked at src/bpf_builder.rs:592:9:
Failed to create BpfBuilder (Err(CPU arch "s390x" not found in ARCH_MAP))
```
https://koji.fedoraproject.org/koji/taskinfo?taskID=111114326