Commit Graph

182 Commits

Author SHA1 Message Date
Tejun Heo
63c4a0191f
Merge branch 'main' into topic/inlined-skeleton-members 2024-08-08 14:23:37 -10:00
Tejun Heo
cd6a4d72c7 Bump versions for 1.0.2 release 2024-08-08 14:10:16 -10:00
Tejun Heo
7c3ffe96e1 Unify crate dependency versions
Different sub-projects are using different versions for the same crates.
Synchronize them to the latest.
2024-08-08 13:26:47 -10:00
Andrea Righi
9d808ae206
Merge pull request #468 from sched-ext/rustland-refactoring
scx_rustland refactoring
2024-08-07 11:38:21 +02:00
Andrea Righi
51cfb69199 scx_rustland_core: re-introduce partial mode
Re-add the partial mode option that was dropped during the refactoring.

The partial option allows to apply the scheduler only to the tasks which
have their scheduling policy set to SCHED_EXT via sched_setscheduler().

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:41:06 +02:00
Andrea Righi
e1f2b3822e scx_rustland_core: drop CPU ownership API
The API for determining which PID is running on a specific CPU is racy
and is unnecessary since this information can be obtained from user
space.

Additionally, it's not reliable for identifying idle CPUs.  Therefore,
it's better to remove this API and, in the future, provide a cpumask
alternative that can export the idle state of the CPUs to user space.

As a consequence also change scx_rustland to dispatch one task a time,
instead of dispatching tasks in batches of idle cores (that are usually
not accurate due to the racy nature of the CPU ownership interaface).

Dispatching one task at a time even makes the scheduler more performant,
due to the vruntime scheduling being applied to more tasks sitting in
the scheduler's queue.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:41:06 +02:00
Andrea Righi
9a0e7755df scx_rustland_core: export counter of online CPUs
Introduce a helper to get the amount of online CPUs tracked by the BPF
part.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
d9c9f78e3e scx_rustland: re-align vruntime and time slice evaluation to scx_bpfland
Drop the slice boost logic and apply a vruntime and task time slice
evaluation approach similar to scx_bpfland (but implement this in the
user-space component instead of the BPF part).

Additionally, introduce a slice_us_min parameter to define the minimum
time slice that can be assigned to a task, also similar to scx_bpfland.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
e1e6e31208 scx_rustland_core: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
b87541a26e scx_rustland_core: refactor idle CPU selection logic
Use the same idle selection logic used in scx_bpfland also in
scx_rustland_core.

Also drop fifo_mode and always use the BPF idle selection logic by
default as long as the system is not saturated, unless full_user is
specified.

This approach allows user-space schedulers aiming for maximum
performance to leverage the BPF idle selection logic (bypassing
user-space), while those seeking full control can enable full_user to
bypass the BPF CPU idle selection logic and choose the target CPU for
each task from user-space.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Daniel Hodges
55ac9d1527
Merge pull request #474 from hodgesds/bpf-builder-fixes
scx_utils: Fix build args
2024-08-06 22:42:44 -04:00
Daniel Hodges
459ed82e6c scx_utils: Fix build args
Fix args to bpf_builder, existing warnings display for
"-Wno-compare-distinct-pointer-types'".

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-06 19:10:34 -07:00
Andrea Righi
d8985306f4 scx_rustland: user-space interactive task classifier
We don't need to send the number of voluntary context switches (nvcsw)
from BPF to user-space, as this information is already accessible in
user-space via procfs. Sending this data would only create unnecessary
overhead for schedulers that don't require it, and those that do can
easily retrieve it through procfs.

Therefore, drop this metric from scx_rustland_core and change
scx_rustland implementing an interactive task classifier fully in the
user-space part of the scheduler.

Also drop some options that are not provide any significant benefit
(also in preparation of a bigger refactoring to define a better API for
the user-space framework).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 17:56:58 +02:00
Andrea Righi
a8d14fc0c4
Merge pull request #471 from sched-ext/rustland-core-musl
scx_rustland_core: add support for musl
2024-08-06 17:56:19 +02:00
Kawanaao
c3109ebeed
scx_rustland_core alloc: Replaced RefCell with Mutex
Necessary for some multi-threaded cases
2024-08-06 13:10:21 +00:00
Andrea Righi
4c7fb5cdbd scx_rustland_core: add support for musl
It seems that musl glibc is not POSIX.1-2001, POSIX.1-2008 compliant and
using sched_setscheduler() just returns -ENOSYS:
https://git.musl-libc.org/cgit/musl/commit/src/sched/sched_setscheduler.c?id=1e21e78bf7a5c24c217446d8760be7b7188711c2

Switch to pthread_setschedparam() to properly support building
scx_rustland_core with musl.

This fixes #469.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 07:32:38 +02:00
Andrea Righi
d4005dd186 scx_rustland_core: fix missing import (timespec)
Explicitly import timespec to fix the following potential build error:

error[E0422]: cannot find struct, variant or union type `timespec` in this scope
 --> src/bpf.rs:365:35
    |
365 | sched_ss_repl_period: timespec {
    |                       ^^^^^^^^ not found in this scope
    |
help: consider importing this struct
    |
6 + use libc::timespec;
    |

This fixes issue #469.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-05 23:15:48 +02:00
Daniel Hodges
6da3554547 scx_utils: Add CPUs method on topology
Add a cpus method various subfields in the topology struct to easily get
the map of CPUs for nodes/LLCs.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-23 10:46:38 -07:00
Daniel Müller
565aec3662 rust: Update libbpf-rs & libbpf-cargo to 0.24
Update libbpf-rs & libbpf-cargo to 0.24. Among other things, generated
skeletons now contain directly accessible map and program objects, no
longer necessitating the use of accessor methods. As a result, the risk
for mutability conflicts is reduced greatly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2024-07-16 11:48:52 -07:00
Daniel Hodges
97fb0fa0c8 scx_utils: Add LLC id to CPU
Add the LLC id to the Cpu struct, which will make it easier for
referencing the LLC when doing operations on a Cpu.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-16 07:46:55 -07:00
Tejun Heo
51334b5c4d Bump versions for 1.0.1 release 2024-07-15 13:21:52 -10:00
Jose Fernandez
a63bf77d68
scx_utils: Display the tag value for nested counters in log_recorder
The log_recorder was incorrectly displaying the tag name instead of the
value for nested counters.

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-07-15 15:20:44 -06:00
Tejun Heo
761ec142ce Bump most versions to 1.0.0
sched_ext is about to be merged upstream. There are some compatibility
breaking changes and we're making the current sched_ext/for-6.11
1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on
!wakeup enqueues") the baseline.

Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early
development and is currently temporarily disabled, only the patchlevel is
bumped.
2024-07-12 11:34:14 -10:00
Tejun Heo
7f8e4edb53
Merge pull request #397 from jfernandez/log-recorder-customize
sched_utils: Add log recorder format customization
2024-07-05 07:37:39 -10:00
I Hsin Cheng
1595da78dc scx_rustland_core: Remove unused variable
Remove unused variable "tctx" in rustland_select_cpu.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-07-05 01:04:49 +08:00
Jose Fernandez
f64cdd1a51
sched_utils: Add log recorder format customization
This change adds the ability to customize the log recorder format for
each metric type. There is a default format that is used if no custom
`MetricFormatter` is provided. This is the same format that was used
before this change.

The `MetricFormatter` should be implemented by the user to customize the
format of the log recorder. The `LogRecorderBuilder` now takes a
`MetricFormatter` as an optional parameter.

Following changes will allow additional customization of the log
recorder format, such as how many metrics are logged per line.

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-28 06:21:03 -06:00
Andrea Righi
a7965abdbc scx_utils: clarify error about missing CONFIG_DEBUG_INFO_BTF
If CONFIG_DEBUG_INFO_BTF is not enabled in the kernel, the C schedulers
report the following error via libbpf, clearly indicating the missing
kernel config:

 libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?

In contrast, the Rust schedulers report a less clear error:

 thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
 btf__load_vmlinux_btf() returned NULL
 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Make sure to report a similar error, so that users have a better clue
about the missing kernel config. After this change the error looks like
the following:

 thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
 btf__load_vmlinux_btf() returned NULL, was CONFIG_DEBUG_INFO_BTF enabled?

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-27 09:15:43 +02:00
David Vernet
2aa8bbc32d
utils: Export build ID values from rust scx_utils
We want schedulers to be able to print, log, etc the build ID of the
repository. To do this, we can use the vergen Cargo crate to generate
environment variables that contain values that we can export from scx_utils.

This patch update scx_utils accordingly to use vergen to generate build
ID output that can be printed from schedulers. A subsequent patch will
update scx_rusty to print this build ID value.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-25 11:44:19 -05:00
David Vernet
9d9ece11aa
Merge pull request #384 from jfernandez/log-recorder
scx_utils: Add log_recorder module for metrics-rs
2024-06-25 11:43:37 -05:00
Andrea Righi
5db0908530 scx_rustland_core: make sure to use a valid CPU during direct dispatch
We may end up selecting an invalid CPU (according to the task's cpumask)
when dispatching the task via dispatch_direct_cpu().

When this happens simply return an error and do not dispatch the task
and let the caller handle the error: in the context of select_cpu() we
can simply ignore the dispatch and return the target CPU; in the context
of FIFO mode dispatch we can fallback to SCX_DSQ_LOCAL if the target CPU
is not valid.

This fixes issue #353.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:11:46 +02:00
Andrea Righi
e4b13b2aa6 scx_rustland_core: reduce dispatch overhead
Kick CPUs in the dispatch path only when needed (typically when tasks
are bounced to other CPUs).

Moreover, avoid to consume all the tasks dispatched at once.

This seems to reduce the BPF overhead (according to bpftop), going from
~10% CPU usage down to ~6% CPU usage of rustland_dispatch() on an over
commissioned system, without introducing any measureable performance
regression.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
631d5576dc scx_rustland_core: refactor CPU selection logic
Allow to dispatch tasks directly (bypassing the user-space scheduler)
only when the scheduler is operating in FIFO mode.

On an over-commissioned system, directly dispatching tasks can only
increase OS noise. These tasks can get a brief priority boost and an
extended time slice just because they found an idle CPU, which can lead
to erratic behavior.

This is particularly problematic when measuring performance stability,
such as evaluating the frames-per-second (fps) of a video game on an
overloaded system.

In such cases, it's better to bounce all tasks to the user-space
scheduler, that will ensure a better level of fairness and smoother
performance.

Moreover, get rid of the second chance dispatch logic introduced in
commit 4791d862 ("scx_rustland_core: second chance CPU migration"). This
seems to provide benefits only on certain architectures (Intel), but it
can introduces lags in others (AMD).

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
19217b5722 scx_rustland_core: clarify comment about resuming FIFO mode
Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
081d4bdb86 scx_rustland_core: add debugging to dispatch_direct_cpu()
Report dispatch_direct_cpu() events in the trace, like any other
dispatch-related event.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Jose Fernandez
e5984ed016
scx_utils: Add log_recorder module for metrics-rs
This change adds a new module to the scx_utils crate that provides a
log recorder for metrics-rs. The log recorder will log all metrics to
the console at a configurable interval in an easy to read format. Each
metric type will be displayed in a separate section. Indentation will
be used to show the hierarchy of the metrics. This results in a more
verbose output, but it is easier to read and understand.

scx_rusty was updated to use the log recorder and all explicit metric
logging was removed.

Counters will show the total count and the rate of change per second.
Counters with an additional label, like `type` in
`dispatched_tasks_total` in rusty, will show the count, rate, and
percentage of the total count.

Counters:
  dispatched_tasks_total: 65559 [1344.8/s]
    prev_idle: 44963 (68.6%) [966.5/s]
    wsync_prev_idle: 15696 (23.9%) [317.3/s]
    direct_dispatch: 2833 (4.3%) [35.3/s]
    dsq: 1804 (2.8%) [21.3/s]
    wsync: 262 (0.4%) [4.3/s]
    direct_greedy: 1 (0.0%) [0.0/s]
    pinned: 0 (0.0%) [0.0/s]
    greedy_idle: 0 (0.0%) [0.0/s]
    greedy_xnuma: 0 (0.0%) [0.0/s]
    direct_greedy_far: 0 (0.0%) [0.0/s]
    greedy_local: 0 (0.0%) [0.0/s]
  dl_clamped_total: 1290 [20.3/s]
  dl_preset_total: 514 [1.0/s]
  kick_greedy_total: 6 [0.3/s]
  lb_data_errors_total: 0 [0.0/s]
  load_balance_total: 0 [0.0/s]
  repatriate_total: 0 [0.0/s]
  task_errors_total: 0 [0.0/s]

Gauges will show the last set value:

Gauges:
  slice_length_us: 20000.00

Histograms will show the average, min, and max. The histogram will be
reset after each log interval to avoid memory leaks, since the data
structure that holds the samples is unbounded.

Histograms:
  cpu_busy_pct: avg=1.66 min=1.16 max=2.16
  load_avg node=0: avg=0.31 min=0.23 max=0.39
  load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39
  processing_duration_us: avg=297.50 min=296.00 max=299.00

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-24 18:45:02 -06:00
David Vernet
bdbf4b9c05
topo: Return nr_cpu_ids from host Topology
In some cases, a host may have an odd topology where there are gaps in
CPU IDs (including between possible CPUs). A common pattern in
schedulers is to perform allocations for every possible CPU ID, such as
creating a per-cpu DSQ. In order to avoid confusing schedulers, let's
track the maximum CPU ID on a system so that we can return the number of
CPU IDs on the system which is inclusive of gaps.

We also update scx_rustland in this change to accommodate the fact that
we no longer export nr_cpus_possible() from TopologyMap.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-21 12:57:13 -05:00
Andrea Righi
b04e82b5eb scx_rustland_core: include buddy-alloc and refactor allocator code
The dependency of the buddy-alloc crate [1] seems to cause some troubles
with packaging, mostly because the selftests for the crate are failing
when it's compiled in release mode.

For example:

 $ cargo test --release -- --nocapture
 thread 'tests::fast_alloc::test_basic_malloc' panicked at src/tests/fast_alloc.rs:25:13:
 assertion `left == right` failed
   left: 0
  right: 42

Some of these failures with BuddyAlloc can be fixed by using a memory
arena buffer aligned to page size.

However, some test failures with FastAlloc persist that cannot be
resolved merely by aligning the pre-allocated memory arena to the page
size, as mentioned in [2].

The concern is that this may potentially lead to actual memory bugs.

Therefore, it seems safer to refactor the custom allocator code to
simply use BuddyAlloc, dropping FastAlloc completely.

To achieve this, the entire BuddyAlloc code has been directly included
in scx_rustland_core, referencing the original project and its MIT
licensing information (with the entire code still distributed under the
GPLv2 license).

Then the code has been slightly modified to remove FastAlloc and the
external dependency on the buddy-alloc crate has been dropped.

From a performance perspective this change doesn't seem to introduce any
measurable regression.

[1] https://github.com/jjyr/buddy-alloc
[2] https://github.com/jjyr/buddy-alloc/issues/16

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-19 14:44:04 +02:00
I Hsin Cheng
1334a4df5d scx_utils: Utilize Entry API for BTreeMap insertion
Take advantages of BTreeMap's Entry API working with or_insert() to do
the conditional insertion. Insert only when the entry doesn't exist.
Doing so can reduce the amount of code and provide better readability
and perform in-place manipulation.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-19 10:27:10 +08:00
I Hsin Cheng
f13697a755 scx_utils: Fix typos
Correct "Thie" to "This".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-18 00:35:43 +08:00
Tejun Heo
d18bb4831a
Merge pull request #363 from sched-ext/htejun/compat-strip
Strip compat support
2024-06-16 20:04:50 -10:00
Tejun Heo
b6ebdc635a compat: Compact min requirement checks
Let's check only the latest one.
2024-06-16 06:53:58 -10:00
Tejun Heo
6319b25cf1 scx_utils: compat.rs: Follow-up clean-ups 2024-06-16 06:45:14 -10:00
Tejun Heo
aeb805a93e compat: Drop support for missing sched_ext_ops.dump*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.dump*(). The
open helper macros now check the existence of the fields and abort if
missing.
2024-06-16 06:43:43 -10:00
Tejun Heo
4cca1e9acf compat: Drop support for missing sched_ext_ops.tick()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.tick(). The
open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:40:28 -10:00
Tejun Heo
970c04b43a compat: Drop support for missing sched_ext_ops.exit_dump_len
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.exit_dump_len.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:37:34 -10:00
Tejun Heo
046bdfd5e0 compat: Drop support for missing sched_ext_ops.hotplug_seq
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.hotplug_seq.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:34:59 -10:00
Tejun Heo
dde2942125 compat: Drop __COMPAT_scx_bpf_cpuperf_*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_*(). The open helper
macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.
2024-06-16 06:16:53 -10:00
Tejun Heo
13e8388e1e compat: Drop __COMPAT_HAS_CPUMASKS
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros
now check the existence of scx_bpf_nr_cpu_ids() and abort if not.
2024-06-16 06:12:06 -10:00
Tejun Heo
66901e2b44 compat: Drop __COMPAT_scx_bpf_dump()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_dump(). The open helper macros
now check the existence of scx_bpf_dump_bstr() and abort if not.

While at it, reorder the min requirement checks so that newly added ones are
up top to make testing easier.
2024-06-16 06:02:47 -10:00
Tejun Heo
0d8adf2260 compat: Drop __COMPAT_scx_bpf_exit()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_exit(). The open helper macros
now check the existence of scx_bpf_exit_bstr() and abort if not.
2024-06-15 20:36:17 -10:00