Add a cpus method various subfields in the topology struct to easily get
the map of CPUs for nodes/LLCs.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Add the LLC id to the Cpu struct, which will make it easier for
referencing the LLC when doing operations on a Cpu.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
sched_ext is about to be merged upstream. There are some compatibility
breaking changes and we're making the current sched_ext/for-6.11
1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on
!wakeup enqueues") the baseline.
Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early
development and is currently temporarily disabled, only the patchlevel is
bumped.
This change adds the ability to customize the log recorder format for
each metric type. There is a default format that is used if no custom
`MetricFormatter` is provided. This is the same format that was used
before this change.
The `MetricFormatter` should be implemented by the user to customize the
format of the log recorder. The `LogRecorderBuilder` now takes a
`MetricFormatter` as an optional parameter.
Following changes will allow additional customization of the log
recorder format, such as how many metrics are logged per line.
Signed-off-by: Jose Fernandez <josef@netflix.com>
If CONFIG_DEBUG_INFO_BTF is not enabled in the kernel, the C schedulers
report the following error via libbpf, clearly indicating the missing
kernel config:
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
In contrast, the Rust schedulers report a less clear error:
thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
btf__load_vmlinux_btf() returned NULL
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Make sure to report a similar error, so that users have a better clue
about the missing kernel config. After this change the error looks like
the following:
thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
btf__load_vmlinux_btf() returned NULL, was CONFIG_DEBUG_INFO_BTF enabled?
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We want schedulers to be able to print, log, etc the build ID of the
repository. To do this, we can use the vergen Cargo crate to generate
environment variables that contain values that we can export from scx_utils.
This patch update scx_utils accordingly to use vergen to generate build
ID output that can be printed from schedulers. A subsequent patch will
update scx_rusty to print this build ID value.
Signed-off-by: David Vernet <void@manifault.com>
This change adds a new module to the scx_utils crate that provides a
log recorder for metrics-rs. The log recorder will log all metrics to
the console at a configurable interval in an easy to read format. Each
metric type will be displayed in a separate section. Indentation will
be used to show the hierarchy of the metrics. This results in a more
verbose output, but it is easier to read and understand.
scx_rusty was updated to use the log recorder and all explicit metric
logging was removed.
Counters will show the total count and the rate of change per second.
Counters with an additional label, like `type` in
`dispatched_tasks_total` in rusty, will show the count, rate, and
percentage of the total count.
Counters:
dispatched_tasks_total: 65559 [1344.8/s]
prev_idle: 44963 (68.6%) [966.5/s]
wsync_prev_idle: 15696 (23.9%) [317.3/s]
direct_dispatch: 2833 (4.3%) [35.3/s]
dsq: 1804 (2.8%) [21.3/s]
wsync: 262 (0.4%) [4.3/s]
direct_greedy: 1 (0.0%) [0.0/s]
pinned: 0 (0.0%) [0.0/s]
greedy_idle: 0 (0.0%) [0.0/s]
greedy_xnuma: 0 (0.0%) [0.0/s]
direct_greedy_far: 0 (0.0%) [0.0/s]
greedy_local: 0 (0.0%) [0.0/s]
dl_clamped_total: 1290 [20.3/s]
dl_preset_total: 514 [1.0/s]
kick_greedy_total: 6 [0.3/s]
lb_data_errors_total: 0 [0.0/s]
load_balance_total: 0 [0.0/s]
repatriate_total: 0 [0.0/s]
task_errors_total: 0 [0.0/s]
Gauges will show the last set value:
Gauges:
slice_length_us: 20000.00
Histograms will show the average, min, and max. The histogram will be
reset after each log interval to avoid memory leaks, since the data
structure that holds the samples is unbounded.
Histograms:
cpu_busy_pct: avg=1.66 min=1.16 max=2.16
load_avg node=0: avg=0.31 min=0.23 max=0.39
load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39
processing_duration_us: avg=297.50 min=296.00 max=299.00
Signed-off-by: Jose Fernandez <josef@netflix.com>
In some cases, a host may have an odd topology where there are gaps in
CPU IDs (including between possible CPUs). A common pattern in
schedulers is to perform allocations for every possible CPU ID, such as
creating a per-cpu DSQ. In order to avoid confusing schedulers, let's
track the maximum CPU ID on a system so that we can return the number of
CPU IDs on the system which is inclusive of gaps.
We also update scx_rustland in this change to accommodate the fact that
we no longer export nr_cpus_possible() from TopologyMap.
Signed-off-by: David Vernet <void@manifault.com>
The dependency of the buddy-alloc crate [1] seems to cause some troubles
with packaging, mostly because the selftests for the crate are failing
when it's compiled in release mode.
For example:
$ cargo test --release -- --nocapture
thread 'tests::fast_alloc::test_basic_malloc' panicked at src/tests/fast_alloc.rs:25:13:
assertion `left == right` failed
left: 0
right: 42
Some of these failures with BuddyAlloc can be fixed by using a memory
arena buffer aligned to page size.
However, some test failures with FastAlloc persist that cannot be
resolved merely by aligning the pre-allocated memory arena to the page
size, as mentioned in [2].
The concern is that this may potentially lead to actual memory bugs.
Therefore, it seems safer to refactor the custom allocator code to
simply use BuddyAlloc, dropping FastAlloc completely.
To achieve this, the entire BuddyAlloc code has been directly included
in scx_rustland_core, referencing the original project and its MIT
licensing information (with the entire code still distributed under the
GPLv2 license).
Then the code has been slightly modified to remove FastAlloc and the
external dependency on the buddy-alloc crate has been dropped.
From a performance perspective this change doesn't seem to introduce any
measurable regression.
[1] https://github.com/jjyr/buddy-alloc
[2] https://github.com/jjyr/buddy-alloc/issues/16
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Take advantages of BTreeMap's Entry API working with or_insert() to do
the conditional insertion. Insert only when the entry doesn't exist.
Doing so can reduce the amount of code and provide better readability
and perform in-place manipulation.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.dump*(). The
open helper macros now check the existence of the fields and abort if
missing.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.tick(). The
open helper macros now check the existence of the field and abort if
missing.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.exit_dump_len.
The open helper macros now check the existence of the field and abort if
missing.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.hotplug_seq.
The open helper macros now check the existence of the field and abort if
missing.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_*(). The open helper
macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros
now check the existence of scx_bpf_nr_cpu_ids() and abort if not.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_dump(). The open helper macros
now check the existence of scx_bpf_dump_bstr() and abort if not.
While at it, reorder the min requirement checks so that newly added ones are
up top to make testing easier.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_exit(). The open helper macros
now check the existence of scx_bpf_exit_bstr() and abort if not.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros
now check the existence of SCX_KICK_IDLE and abort if not.
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper
macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.
scx_ops_open!() and scx_ops_attach!() could return the calling function
after an error, which can be surprising. Forutnately, as all the current
callers are either unwrapping or returning on error, the surprising behavior
is currently not very noticeable.
Fix it by breaking out of the macro block on errors.
This change adds the CPU frequency transition latency from the
`cpuinfo_transition_latency` from sysfs. The value of this field is
described [cpufreq
docs](https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt).
On supported systems it returns the CPU frequency transition latency in
nanoseconds. The goal of this change is so that in the future schedulers
can use this data to make better frequency scaling decisions.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
- scx_utils: Replace kfunc_exists() with ksym_exists() which doesn't care
about the type of the symbol.
- scx_layered: Fix load failure on kernels >= v6.10-rc due to
scheduler_tick() -> sched_tick rename. Attach the tick fentry function to
either scheduler_tick() or sched_tick().
Make restart handling with user_exit_info simpler and consistently use the
load and report macros consistently across the rust schedulers. This makes
all schedulers automatically handle auto restarts from CPU hotplug events.
Note that this is necessary even for scx_lavd which has CPU hotplug
operations as CPU hotplug operations which took place between skel open and
scheduler init can still trigger restart.
C SCX_OPS_ATTACH() and rust scx_ops_attach() macros were not calling
.attach() and were only attaching the struct_ops. This meant that all
non-struct_ops BPF programs contained in the skels were never attached which
breaks e.g. scx_layered.
Let's fix it by adding .attach() invocation the the attach macros.
In order to make it easy for schedulers to use the hotplug_seq feature that's
available in recent kernels, we'll need to provide a macro wrapper so that we
can support the feature with backwards compatibility. This adds scx_ops_open!()
to abstract that. Any scheduler that uses scx_ops_open()! will be exited if a
hotplug event happens between opening the skeleton, and loading it.
Signed-off-by: David Vernet <void@manifault.com>
Now that the kernel exports the SCX_ECODE_ACT_RESTART exit code, we can
remove the custom hotplug logic from scx_rusty, and instead rely on the
built-in logic from the kernel. There's still a corner case that we're not
honoring: when a hotplug event happens on the init path. A future change will
address this as well.
Signed-off-by: David Vernet <void@manifault.com>
Schedulers and the kernel can include an exit code when exiting a scheduler.
There are some built-in codes that can be specified: SCX_ECODE_RSN_HOTPLUG,
and SCX_ECODE_ACT_RESTART. Some schedulers may want to check the exit
code against these values, so let's export them from user_exit_info.rs.
We use lazy_static so that we can read the values for the enum for the
currently-running kernel.
Signed-off-by: David Vernet <void@manifault.com>
Some users are running with NUMA disabled, which makes sense given that it's
useless in a lot of contexts. Let's make the Topology crate assume a default
node with ID 0 in such cases.
Signed-off-by: David Vernet <void@manifault.com>
Add a method to TopologyMap to get the amount of online CPUs.
Considering that most of the schedulers are not handling CPU hotplugging
it can be useful to expose also this metric in addition to the amount of
available CPUs in the system.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If another scheduler is already running, the Rust schedulers based on
scx_utils are reporting an error like the following, that can be a bit
difficult to understand:
Error: Failed to attach struct ops
Caused by:
bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL
Change the scx_ops_attach macro to check if another sched_ext scheduler
is running and in that case report a more explicit error.
With this applied:
$ sudo scx_rustland
Error: another sched_ext scheduler is already running
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
It would be useful to be able to check whether a kfunc exists from rust
user space. For example, now that we have support for adjusting cpufreq
in scx_layered, we'll want to be able to test whether or not a the
scx_bpf_cpuperf_set() (and friends) kfuncs are present for backwards
compat purposes. Let's add a kfunc_exists() function to compat.rs for
this purpose.
Signed-off-by: David Vernet <void@manifault.com>
Introuce a TopologyMap object, represented as an array of arrays, where
each inner array corresponds to a core containing its associated CPU
IDs.
This object can be used as a cache to facilitate efficient iteration
over the entire host's topology.
Example usage:
let topo = Topology::new()?;
let topo_map = TopologyMap::new(topo)?;
for (core_id, core) in topo_map.iter().enumerate() {
for cpu in core {
println!("core={} cpu={}", core_id, cpu);
}
}
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We're currently cloning cpumasks returned by calls to {Core, Cache,
Node, Topology}::span(). If a caller needs to clone it, they can. Let's
not penalize the callers that just want to query the underlying cpumask.
Signed-off-by: David Vernet <void@manifault.com>
Newer kernels also support exiting gracefully with an exit code. Let's
update the UserExitInfo struct to also read and export this value.
Signed-off-by: David Vernet <void@manifault.com>