Commit Graph

275 Commits

Author SHA1 Message Date
Andrea Righi
53617042b3 scx_rustland_core: avoid critical failures due by missing task context
Prevent triggering a critical error when a local context for a task
can't be found.

Instead, handle the error gracefully (reporting a warning in debugfs) to
enhance the robustness of the schedulers based on scx_rustland_core.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-05 10:58:44 +02:00
Tejun Heo
708aaaafb9 meson: Update rust/meson.build
Targeted build is now available for libraries too.
2024-09-04 06:49:55 -10:00
Tejun Heo
4513dfbe4b
Merge pull request #565 from CachyOS/feature/scx-loader
scx_loader: Add scheduler loader via system DBUS interface
2024-09-04 06:34:59 -10:00
Andrea Righi
c3cab45f6a scx_rustland_core: bump up version to 2.0.1
Bump up scx_rustland_core version to include this critical fix that
allows to prevent scheduler stalls:

 94a3594 ("scx_rustland_core: always dispatch per-cpu kthreads directly")

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-04 08:00:25 +02:00
Andrea Righi
94a359434f scx_rustland_core: always dispatch per-cpu kthreads directly
Do not send per-CPU kthreads to the user-space scheduler, but always
dispatch them directly from BPF.

In specific environments, sending critical per-CPU kthreads to the
user-space scheduler can lead to potential stalls. This occurs because
the user-space scheduler might be blocked by an action that these
per-CPU kthreads need to perform, but they cannot complete their action
if the scheduler needs to schedule them, hence the deadlock.

To prevent this deadlock, always dispatch the per-CPU kthreads directly
from the BPF component, ensuring that the user-space scheduler does not
get blocked by these events.

Fixes: c0a2cfb ("scx_rustland_core: always schedule per-CPU kthreads to user-space")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-04 07:56:58 +02:00
Andrea Righi
0aa71c832b scx_rustland_core: bump up major version to 2.0.0
The scx_rustland_core API has been redesigned recently, breaking the
compatibility with the past.

Considering that Rust crates should update their major version when the
previous API becomes incompatible [1], bump up the version to 2.0.0.

[1] https://semver.org/

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-31 23:23:26 +02:00
Vladislav Nepogodin
4d770e1f84
scx_loader: Add scheduler loader via system DBUS interface 2024-08-30 00:56:27 +04:00
Daniel Hodges
f0c9a3932d scx_utils: Add cores helper to node topology
Add a helper for getting the cores per node.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-29 05:21:56 -07:00
Daniel Hodges
12f8cb74b5 scx_utils: Add GPU topology
Add GPU awareness to the topology crate.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 06:35:35 -07:00
Andrea Righi
a66ce5ce56 scx_utils::cpumask: support special strings "all" and "none"
Allow to create a Cpumask from a string "all" (enable all the CPUs) or
"none" (disable all CPUs).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:15:01 +02:00
Andrea Righi
872e653cd2 scx_utils: introduce Turbo core type to Topology
Integrate the logic used by scx_bpfland to detect turbo-boosted cores in
Topology.

Also change the logic to detect Big/Little cores in function of
base_frequency, instead of scaling_max_freq, otherwise turbo-boosted
cores in homogeneous systems may be incorrectly classified as Big.

Moreover, introduce the following new methods to Cpu to check for the
core type:
 - is_turbo(): return true if the CPU is Turbo, false otherwise
 - is_big(): return true if the CPU is either Turbo or Big
 - is_little(): return true if the CPU is Little

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:09:08 +02:00
Daniel Hodges
250eacdcab
Merge pull request #570 from hodgesds/topo-updates
scx_utils: Add Big/Little core logic to Topology
2024-08-27 10:30:37 -04:00
Daniel Hodges
1e9fb9ba97 scx_utils: Add Big/Little core logic to Topology
This borrows some of the logic in scx_lavd for figuring out if a core is
a Big/Little core. If this makes sense we can add helper methods
directly on the topology to return Big/Little cores so that each
scheduler doesn't have to reinvent the same logic.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-27 07:01:18 -07:00
Andrea Righi
0bdbb255bb scx_rustland_core: update README.md with a FIFO example
Include the FIFO example directly in the README.md, instead of linking
scx_rlfifo.

Including the example directly in the README can be more useful and
practical in those cases where internet access is not available or when
we need to distribute a more "standalone" documentation.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 17:42:51 +02:00
Andrea Righi
820fc5a8b4 scx_rustland_core: allow to propagate vtime to BPF
Introduce a vtime attribute to struct DispatchedTask that can be set by
the user-space scheduler and it'll be use by the BPF component to
dispatch the task via scx_bpf_dispatch_vtime().

In this way a user-space scheduler can decide to apply its own internal
task ordering or rely on the BPF vtime priority DSQs (or both).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 15:56:05 +02:00
Andrea Righi
2ee07bb1fb scx_rustland_core: temporarily drop RL_PREEMPT_CPU
Temporarily drop the RL_PREEMPT_CPU flag, we need a better way to
implement preemption in scx_rustland_core and it's not very effective at
the moment, so simply it drop it for now (it'll be re-added later in the
future in a proper way).

This change does not affect any scheduler, since RL_PREEMPT_CPU is
currently unused.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 15:54:32 +02:00
Daniel Hodges
bf67b83561
Merge pull request #549 from hodgesds/stats-path-fix
scx_utils: Add retryable errors
2024-08-25 15:22:46 -04:00
Daniel Hodges
c31f2b63cb
scx_utils: Add retryable errors
Add a set of retryable errors for scx_utils when connecting to the stats
server.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-25 14:54:36 -04:00
Tejun Heo
ca13e13ad6
Merge pull request #559 from sched-ext/htejun/cargo-workspace
build: Use workspace to group rust sub-projects
2024-08-25 06:26:18 -10:00
Andrea Righi
8853d9a9f2
Merge pull request #548 from sched-ext/rustland-core-refactoring
scx_rustland_core: user-space framework refactoring
2024-08-25 16:39:28 +02:00
Tejun Heo
43950c65bd build: Use workspace to group rust sub-projects
meson build script was building each rust sub-project under rust/ and
scheds/rust/ separately. This means that each rust project is built
independently which leads to a couple problems - 1. There are a lot of
shared dependencies but they have to be built over and over again for each
proejct. 2. Concurrency management becomes sad - we either have to unleash
multiple cargo builds at the same time possibly thrashing the system or
build one by one.

We've been trying to solve this from meson side in vain. Thankfully, in
issue #546, @vimproved suggested using cargo workspace which makes the
sub-projects share the same target directory and built together by the same
cargo instance while still allowing each project to behave independently for
development and publishing purposes.

Make the following changes:

- Create two cargo workspaces - one under rust/, the other under
  scheds/rust/. Each contains all rust projects underneath it.

- Don't let meson descend into rust/. These are libraries used by the rust
  schedulers. No need to build them from meson. Cargo will build them as
  needed.

- Change the rust_scheds build target to invoke `cargo build` in
  scheds/rust/ and let cargo do its thing.

- Remove per-scheduler meson.build files and instead generate custom_targets
  in scheds/rust/meson.build which invokes `cargo build -p $SCHED`.

- This changes rust binary directory. Update README and
  meson-scripts/install_rust_user_scheds accordingly.

- Remove per-scheduler Cargo.lock as scheds/rust/Cargo.lock is shared by all
  schedulers now.

- Unify .gitignore handling.

The followings are build times on Ryzen 3975W:

Before:
  ________________________________________________________
  Executed in  165.93 secs    fish           external
     usr time   40.55 mins    2.71 millis   40.55 mins
     sys time    3.34 mins   36.40 millis    3.34 mins

After:
  ________________________________________________________
  Executed in   36.04 secs    fish           external
     usr time  336.42 secs    0.00 millis  336.42 secs
     sys time   36.65 secs   43.95 millis   36.61 secs

Wallclock time is reduced 5x and CPU time 7x.
2024-08-25 00:47:58 -10:00
Andrea Righi
41dfa2481b scx_rustland_core: update README.md
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 12:39:08 +02:00
Andrea Righi
894f9582d0 scx_rustland_core: hide shutdown boilerplate in BpfScheduler
Refactor the code to hide the shutdown handling inside BpfScheduler and
simply use the exited() method to check when the scheduler is stopped.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 12:17:04 +02:00
Tejun Heo
c93191a213 scx_stats: Make StatsServerData::describe_meta() output more readable
Add deliminators to make the output easier on the eyes.
2024-08-24 23:15:52 -10:00
Tejun Heo
625381280c scx_stats: Shorten exported names and add prelude module
Let's make it a bit easier to use:

- Shorten exported names by changing the prefix from ScxStats to Stats. This
  should be distinctive enough and more inline with how most libraries name
  their exports.

- Importing the right set of traits can be tricky. Introduce prelude module
  so that importing is a bit less painful.
2024-08-24 22:04:25 -10:00
Andrea Righi
a2e97fecbb scx_rustland_core: merge verbose and debug in the same option
There is no reason to have two separate options for "verbose" and
"debug" mode. Just merge the two and always use "debug". If enabled,
increase verbosity to stdout and enable reporting BPF scheduling events
in debugfs (e.g., /sys/kernel/debug/tracing/trace_pipe).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:45:20 +02:00
Andrea Righi
cb16a11342 scx_rustland_core: get rid of the global scheduler's slice_us
Since scx_rustland_core enables setting a time slice on a per-task basis
during task dispatch, there's no need to maintain a global time slice in
the BPF component. Instead, a global time slice can simply be managed in
user-space, achieving the same outcome.

Therefore, drop the global slice_us property from BpfScheduler to
simplify the API.

NOTE: if a time slice is not specified for a task, SCX_SLICE_DFL will be
used by default.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:45:18 +02:00
Tejun Heo
52aed6fecf scx_stats: Don't warn on missing "top" stats
Some schedulers may not want to expose "top" stats. No need to warn on it.
Change the message to debug.
2024-08-24 18:51:05 -10:00
Tejun Heo
1bba713a29
Merge pull request #542 from sched-ext/htejun/scx_stats
scx_stats, scx_rusty, scx_layered: Implement `--help-stats`
2024-08-24 15:38:36 -10:00
Andrea Righi
0aa23481de scx_rustland_core: drop update_tasks() and introduce notify_complete()
The update_tasks() API is somewhat confusing, so replace it with a
clearer API, notify_complete().

This new API will return control to the BPF component and inform it
about the number of tasks still pending in the user-space scheduler.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 00:45:23 +02:00
Andrea Righi
cef8ff8757 scx_rustland_core: get rid of the low_power API
The low-power API is a bit of a hack implemented purely in the BPF
layer, this should be better re-implemented with some concepts of
topology awareness.

Therefore, get rid of this API for now.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
568e292a24 scx_rustland_core: get rid of the exiting task API
The current API used to notify the user-space scheduler when a task
exits is really confusing (setting a negative value in
queued_task_ctx.cpu), and it's also possible to detect task exiting
events from user-space (or check in procfs, even if it's slower).

In any case, a better API should be provided for this, so drop the
current one for now.

NOTE: this will cause additional memory usage for scx_rustland, but it
can be fixed/addressed later in a separate commit (i.e., providing a
periodic garbage collector for the unused task entries).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
eec395f16a scx_rustland_core: better time slice control
Instead of determining the task time slice in ops.enqueue(), refresh the
time slice immediately before the task is started on its assigned CPU in
ops.running().

This ensures to apply the exact time slice specified by the user-space
scheduler and the sched_ext core will never implicitly dispatch tasks
using SCX_SLICE_DFL.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
c0a2cfb481 scx_rustland_core: always schedule per-CPU kthreads to user-space
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
5d544ea264 scx_rustland_core: move CPU idle selection logic in user-space
Allow user-space scheduler to pick an idle CPU via
self.bpf.select_cpu(pid, prev_task, flags), mimicking the BPF's
select_cpu() iterface.

Also remove the full_user option and always rely on the idle selection
logic from user-space.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:28:13 +02:00
Tejun Heo
c7d8cbd000 scx_stats: ScxStatsServerData::visit_meta() should visit each meta only once
Otherwise, we can end up re-verifying and re-printing the same structs.
2024-08-23 12:47:28 -10:00
Tejun Heo
dce4117b99 scx_stats: Renames for clarification 2024-08-23 12:43:58 -10:00
Tejun Heo
25e437753c scx_layered, scx_rusty: Implement --help-stats
which shows all the defined stats. While at it, make some cosmetic updates.
2024-08-23 12:39:47 -10:00
Tejun Heo
405bcc63fe scx_stats: Make ScxStatsServerData a public carrier of data needed for stats server
And move related ops into it. This is a bit more natural and will also allow
doing other operaitons (e.g. describing stats) without launching the server.
2024-08-23 12:23:57 -10:00
Tejun Heo
e878dac619 scx_stats: Implement ScxStatsServer.describe_meta()
This can be used to generate help message for statistics.
2024-08-23 11:56:27 -10:00
Andrea Righi
e72676ede3
Merge pull request #540 from sched-ext/bpfland-cpufreq-awareness
scx_bpfland: cpu frequency and energy awareness
2024-08-23 21:17:34 +02:00
Tejun Heo
9e3b4e6db0 scx_stats: A bit of cleanups and renames 2024-08-23 09:09:02 -10:00
Tejun Heo
8c8912ccea Merge branch 'main' into htejun/scx_rusty 2024-08-23 07:50:23 -10:00
Andrea Righi
bb7248ce61 scx_utils::cpumask: introduce is_empty() and is_full()
Introduce new methods to CpuMask to check if no bits are set or if all
bits are set.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-23 19:48:53 +02:00
Tejun Heo
44a0f1b124 scx_utils: Factor out monitor_stats() from scx_rusty and scx_layered 2024-08-23 06:46:19 -10:00
Tejun Heo
70ba4cb9ef scx_stats: Fix multiple labels handling in scxstats_to_openmetrics.py
OM labels() was called with an array which is then incorrectly interpreted
as a single label. Unpack it to list of arguments. While at it, make error
reporting a bit more robust.
2024-08-23 04:56:30 -10:00
Tejun Heo
4678476ca7 scx_stats_derive: Each _AssScxStastMeta assertion should have an unique ID 2024-08-22 13:53:27 -10:00
Tejun Heo
13fa48a871 scx_rusty: Separate out stats generation and formatting
to prepare for scx_stats conversion.
2024-08-22 10:03:10 -10:00
Tejun Heo
4d1f0639d8 Version: v1.0.3 2024-08-21 06:42:11 -10:00
Tejun Heo
6a2faf2e17
Merge pull request #530 from sched-ext/htejun/misc
scx_utils::topology: Use lazy_static instead of LazyLock
2024-08-21 06:04:14 -10:00