Commit Graph

259 Commits

Author SHA1 Message Date
Tejun Heo
4d88c9aec7 scx_stats: Add channel arguments to open and close ops too 2024-08-19 06:56:14 -10:00
Tejun Heo
a77fe372d6 scx_stats: Make server shutdown when connection is dropped and add communication channel
This will make implementing connection sessions easier where each stats
client connection maintains a set of states.
2024-08-19 06:23:16 -10:00
Tejun Heo
689d380db1 scx_stats: Implement ScxStatsServer::add_stats_ops()
This allows stats reader to maintain persistent states per connection.
2024-08-16 13:24:17 -10:00
Tejun Heo
da2e014d15 scx_stats: Add more documentation on user attributes in README 2024-08-16 09:11:23 -10:00
Tejun Heo
3a688cfde7 scx_stats: Add support for no-value user attributes and a bunch of other changes
- Allow no-value user attributes which are automatically assigned "true"
  when specified.

- Make "top" attribute string "true" instead of bool true for consistency.
  Testing for existence is always enough for value-less attributes.

- Don't drop leading "_" from user attribute names when storing in dicts.
  Dropping makes things more confusing.

- Add "_om_skip" to scx_layered fields which don't jive well with OM.
  scxstats_to_openmetrics.py is updated accordignly and no longer generates
  warnings on those fields.

- Examples and README updated accordingly.
2024-08-16 07:52:02 -10:00
Tejun Heo
870a262713 scx_stats: Add scripts/scxstats_to_openmetrics.py
This is a generic tool to pipe from scx_stats to OpenMetrics. This is a
barebone implmentation and the current output may not match what scx_layered
was outputting before. Will be updated later.
2024-08-15 22:51:22 -10:00
Tejun Heo
ea453e51d3 scx_stats: Rename "all" attribute to "top" and clean up examples a bit 2024-08-15 21:24:55 -10:00
Tejun Heo
6a5d6f7c27 scx_stats: Replace field_prefix attribute with '_' prefixed user attributes 2024-08-15 19:09:59 -10:00
Tejun Heo
f7c5a598bc scx_stats: Store ScxStatsMeta in BTreeMap instead of Vec
This makes the metadata easier to use.
2024-08-15 18:32:53 -10:00
Tejun Heo
834ce62b95 scx_stats: Fields ScxStatsMeta should be a BTreeMap not vec
Also simplify trait bound assertion.
2024-08-15 18:21:19 -10:00
Tejun Heo
f345e83c05 scx_stats: Replace om_prefix field attribute with field_prefix struct attribute
And strongly distinguish between field and struct attributes while parsing.
2024-08-15 18:05:24 -10:00
Tejun Heo
a9922deaa2 scx_stats: Add "all" attribute and rename metadata type strings 2024-08-15 14:50:00 -10:00
Tejun Heo
ebc1a89c34 scx_stats: s/stat/stats/ stragglers 2024-08-15 14:00:00 -10:00
Tejun Heo
bafd67b568 scx_stats: Fix parsing for multiple stat attributes
The code was assuming single attribute per #[stat()] block. Update it so
that there can be multiple comma separated attributes in a single block.
2024-08-15 13:46:20 -10:00
Tejun Heo
db0ddbe249 scx_utils: Add support for om_prefix stat attribute
This will be used to build generic bridge to openmetrics.
2024-08-15 13:08:44 -10:00
Tejun Heo
ba2e0be899 scx_stats: Add .gitignore 2024-08-15 12:31:04 -10:00
Tejun Heo
0b9c8b5cbd scx_stats: Update versions to 0.2.0 to republish 2024-08-15 12:29:27 -10:00
Tejun Heo
d6ef2c2a1f scx_stats: Synchronize crate dependency versions 2024-08-15 12:28:29 -10:00
Tejun Heo
d037fcd223 scx_stats: Drop version from Cargo.toml::dev-dependencies
Otherwise, the cyclic dependency prevents publishing.
2024-08-15 12:26:17 -10:00
Tejun Heo
ec51c567fe scx_stats: Adding missing license tag in Cargo.toml's 2024-08-15 12:23:46 -10:00
Tejun Heo
47c774d011 scx_stats: Add README and cleanup examples 2024-08-15 12:17:30 -10:00
Tejun Heo
616285baa8 scx_stats: Add package metadata 2024-08-15 11:09:26 -10:00
Tejun Heo
17afc72479 scx_stats: Rename cleanups
- s/stat/stats/ on several stragglers.

- Rename traits so that they are more distinctive from struct and other
  names and follow the convention.
2024-08-15 06:24:56 -10:00
Tejun Heo
8aae9a5de2 scx_stats: s/scx_stat/scx_stats/
Use plural form which is more widespread and also used in scheduler
implementations. No functional changes.
2024-08-15 05:31:34 -10:00
Tejun Heo
7820ec9b46 scx_stat, scx_layered: cargo fmt 2024-08-14 11:47:37 -10:00
Tejun Heo
03495cd37b scx_stat: Initial commit
scx_stat is a statistics reporting library with the following goals:

- Minimal boilerplate code. Statistics are defined as a regular rust struct
  which can contain integers, floats, strings, structs, and array or
  BTreeMap of them. The struct and each field can be annotated using the
  "stat" attribute. The "Stat" derive macro generates the necessary
  metadata.

- On-demand reporting. ScxStatServer implements a UNIX domain socket server
  which communicates using a simple json protocol. Stat structs can be added
  to the server with closures to output them. There can be any number of
  readers. ScxStatServer can be queried for the metadata of all stat structs
  that it may report so that a generic tool can be written to pipe the
  reported statistics to other frameworks (e.g. openmetrics).

- The interface protocol is pretty simple and it isn't difficult to interact
  directly in json. ScxStatClient is provided to further simplify client
  implementation.

- See rust/scx_stat/examples/{server,client}.rs for usage.
2024-08-14 11:37:06 -10:00
Tejun Heo
45f7fd13b7 versions: Synchronize crate dependency versions 2024-08-08 14:45:46 -10:00
Tejun Heo
63c4a0191f
Merge branch 'main' into topic/inlined-skeleton-members 2024-08-08 14:23:37 -10:00
Tejun Heo
cd6a4d72c7 Bump versions for 1.0.2 release 2024-08-08 14:10:16 -10:00
Tejun Heo
7c3ffe96e1 Unify crate dependency versions
Different sub-projects are using different versions for the same crates.
Synchronize them to the latest.
2024-08-08 13:26:47 -10:00
Andrea Righi
9d808ae206
Merge pull request #468 from sched-ext/rustland-refactoring
scx_rustland refactoring
2024-08-07 11:38:21 +02:00
Andrea Righi
51cfb69199 scx_rustland_core: re-introduce partial mode
Re-add the partial mode option that was dropped during the refactoring.

The partial option allows to apply the scheduler only to the tasks which
have their scheduling policy set to SCHED_EXT via sched_setscheduler().

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:41:06 +02:00
Andrea Righi
e1f2b3822e scx_rustland_core: drop CPU ownership API
The API for determining which PID is running on a specific CPU is racy
and is unnecessary since this information can be obtained from user
space.

Additionally, it's not reliable for identifying idle CPUs.  Therefore,
it's better to remove this API and, in the future, provide a cpumask
alternative that can export the idle state of the CPUs to user space.

As a consequence also change scx_rustland to dispatch one task a time,
instead of dispatching tasks in batches of idle cores (that are usually
not accurate due to the racy nature of the CPU ownership interaface).

Dispatching one task at a time even makes the scheduler more performant,
due to the vruntime scheduling being applied to more tasks sitting in
the scheduler's queue.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:41:06 +02:00
Andrea Righi
9a0e7755df scx_rustland_core: export counter of online CPUs
Introduce a helper to get the amount of online CPUs tracked by the BPF
part.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
d9c9f78e3e scx_rustland: re-align vruntime and time slice evaluation to scx_bpfland
Drop the slice boost logic and apply a vruntime and task time slice
evaluation approach similar to scx_bpfland (but implement this in the
user-space component instead of the BPF part).

Additionally, introduce a slice_us_min parameter to define the minimum
time slice that can be assigned to a task, also similar to scx_bpfland.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
e1e6e31208 scx_rustland_core: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
b87541a26e scx_rustland_core: refactor idle CPU selection logic
Use the same idle selection logic used in scx_bpfland also in
scx_rustland_core.

Also drop fifo_mode and always use the BPF idle selection logic by
default as long as the system is not saturated, unless full_user is
specified.

This approach allows user-space schedulers aiming for maximum
performance to leverage the BPF idle selection logic (bypassing
user-space), while those seeking full control can enable full_user to
bypass the BPF CPU idle selection logic and choose the target CPU for
each task from user-space.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Daniel Hodges
55ac9d1527
Merge pull request #474 from hodgesds/bpf-builder-fixes
scx_utils: Fix build args
2024-08-06 22:42:44 -04:00
Daniel Hodges
459ed82e6c scx_utils: Fix build args
Fix args to bpf_builder, existing warnings display for
"-Wno-compare-distinct-pointer-types'".

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-06 19:10:34 -07:00
Andrea Righi
d8985306f4 scx_rustland: user-space interactive task classifier
We don't need to send the number of voluntary context switches (nvcsw)
from BPF to user-space, as this information is already accessible in
user-space via procfs. Sending this data would only create unnecessary
overhead for schedulers that don't require it, and those that do can
easily retrieve it through procfs.

Therefore, drop this metric from scx_rustland_core and change
scx_rustland implementing an interactive task classifier fully in the
user-space part of the scheduler.

Also drop some options that are not provide any significant benefit
(also in preparation of a bigger refactoring to define a better API for
the user-space framework).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 17:56:58 +02:00
Andrea Righi
a8d14fc0c4
Merge pull request #471 from sched-ext/rustland-core-musl
scx_rustland_core: add support for musl
2024-08-06 17:56:19 +02:00
Kawanaao
c3109ebeed
scx_rustland_core alloc: Replaced RefCell with Mutex
Necessary for some multi-threaded cases
2024-08-06 13:10:21 +00:00
Andrea Righi
4c7fb5cdbd scx_rustland_core: add support for musl
It seems that musl glibc is not POSIX.1-2001, POSIX.1-2008 compliant and
using sched_setscheduler() just returns -ENOSYS:
https://git.musl-libc.org/cgit/musl/commit/src/sched/sched_setscheduler.c?id=1e21e78bf7a5c24c217446d8760be7b7188711c2

Switch to pthread_setschedparam() to properly support building
scx_rustland_core with musl.

This fixes #469.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 07:32:38 +02:00
Andrea Righi
d4005dd186 scx_rustland_core: fix missing import (timespec)
Explicitly import timespec to fix the following potential build error:

error[E0422]: cannot find struct, variant or union type `timespec` in this scope
 --> src/bpf.rs:365:35
    |
365 | sched_ss_repl_period: timespec {
    |                       ^^^^^^^^ not found in this scope
    |
help: consider importing this struct
    |
6 + use libc::timespec;
    |

This fixes issue #469.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-05 23:15:48 +02:00
Daniel Hodges
6da3554547 scx_utils: Add CPUs method on topology
Add a cpus method various subfields in the topology struct to easily get
the map of CPUs for nodes/LLCs.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-23 10:46:38 -07:00
Daniel Müller
565aec3662 rust: Update libbpf-rs & libbpf-cargo to 0.24
Update libbpf-rs & libbpf-cargo to 0.24. Among other things, generated
skeletons now contain directly accessible map and program objects, no
longer necessitating the use of accessor methods. As a result, the risk
for mutability conflicts is reduced greatly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2024-07-16 11:48:52 -07:00
Daniel Hodges
97fb0fa0c8 scx_utils: Add LLC id to CPU
Add the LLC id to the Cpu struct, which will make it easier for
referencing the LLC when doing operations on a Cpu.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-16 07:46:55 -07:00
Tejun Heo
51334b5c4d Bump versions for 1.0.1 release 2024-07-15 13:21:52 -10:00
Jose Fernandez
a63bf77d68
scx_utils: Display the tag value for nested counters in log_recorder
The log_recorder was incorrectly displaying the tag name instead of the
value for nested counters.

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-07-15 15:20:44 -06:00
Tejun Heo
761ec142ce Bump most versions to 1.0.0
sched_ext is about to be merged upstream. There are some compatibility
breaking changes and we're making the current sched_ext/for-6.11
1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on
!wakeup enqueues") the baseline.

Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early
development and is currently temporarily disabled, only the patchlevel is
bumped.
2024-07-12 11:34:14 -10:00
Tejun Heo
7f8e4edb53
Merge pull request #397 from jfernandez/log-recorder-customize
sched_utils: Add log recorder format customization
2024-07-05 07:37:39 -10:00
I Hsin Cheng
1595da78dc scx_rustland_core: Remove unused variable
Remove unused variable "tctx" in rustland_select_cpu.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-07-05 01:04:49 +08:00
Jose Fernandez
f64cdd1a51
sched_utils: Add log recorder format customization
This change adds the ability to customize the log recorder format for
each metric type. There is a default format that is used if no custom
`MetricFormatter` is provided. This is the same format that was used
before this change.

The `MetricFormatter` should be implemented by the user to customize the
format of the log recorder. The `LogRecorderBuilder` now takes a
`MetricFormatter` as an optional parameter.

Following changes will allow additional customization of the log
recorder format, such as how many metrics are logged per line.

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-28 06:21:03 -06:00
Andrea Righi
a7965abdbc scx_utils: clarify error about missing CONFIG_DEBUG_INFO_BTF
If CONFIG_DEBUG_INFO_BTF is not enabled in the kernel, the C schedulers
report the following error via libbpf, clearly indicating the missing
kernel config:

 libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?

In contrast, the Rust schedulers report a less clear error:

 thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
 btf__load_vmlinux_btf() returned NULL
 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Make sure to report a similar error, so that users have a better clue
about the missing kernel config. After this change the error looks like
the following:

 thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9:
 btf__load_vmlinux_btf() returned NULL, was CONFIG_DEBUG_INFO_BTF enabled?

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-27 09:15:43 +02:00
David Vernet
2aa8bbc32d
utils: Export build ID values from rust scx_utils
We want schedulers to be able to print, log, etc the build ID of the
repository. To do this, we can use the vergen Cargo crate to generate
environment variables that contain values that we can export from scx_utils.

This patch update scx_utils accordingly to use vergen to generate build
ID output that can be printed from schedulers. A subsequent patch will
update scx_rusty to print this build ID value.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-25 11:44:19 -05:00
David Vernet
9d9ece11aa
Merge pull request #384 from jfernandez/log-recorder
scx_utils: Add log_recorder module for metrics-rs
2024-06-25 11:43:37 -05:00
Andrea Righi
5db0908530 scx_rustland_core: make sure to use a valid CPU during direct dispatch
We may end up selecting an invalid CPU (according to the task's cpumask)
when dispatching the task via dispatch_direct_cpu().

When this happens simply return an error and do not dispatch the task
and let the caller handle the error: in the context of select_cpu() we
can simply ignore the dispatch and return the target CPU; in the context
of FIFO mode dispatch we can fallback to SCX_DSQ_LOCAL if the target CPU
is not valid.

This fixes issue #353.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:11:46 +02:00
Andrea Righi
e4b13b2aa6 scx_rustland_core: reduce dispatch overhead
Kick CPUs in the dispatch path only when needed (typically when tasks
are bounced to other CPUs).

Moreover, avoid to consume all the tasks dispatched at once.

This seems to reduce the BPF overhead (according to bpftop), going from
~10% CPU usage down to ~6% CPU usage of rustland_dispatch() on an over
commissioned system, without introducing any measureable performance
regression.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
631d5576dc scx_rustland_core: refactor CPU selection logic
Allow to dispatch tasks directly (bypassing the user-space scheduler)
only when the scheduler is operating in FIFO mode.

On an over-commissioned system, directly dispatching tasks can only
increase OS noise. These tasks can get a brief priority boost and an
extended time slice just because they found an idle CPU, which can lead
to erratic behavior.

This is particularly problematic when measuring performance stability,
such as evaluating the frames-per-second (fps) of a video game on an
overloaded system.

In such cases, it's better to bounce all tasks to the user-space
scheduler, that will ensure a better level of fairness and smoother
performance.

Moreover, get rid of the second chance dispatch logic introduced in
commit 4791d862 ("scx_rustland_core: second chance CPU migration"). This
seems to provide benefits only on certain architectures (Intel), but it
can introduces lags in others (AMD).

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
19217b5722 scx_rustland_core: clarify comment about resuming FIFO mode
Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Andrea Righi
081d4bdb86 scx_rustland_core: add debugging to dispatch_direct_cpu()
Report dispatch_direct_cpu() events in the trace, like any other
dispatch-related event.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-25 14:02:39 +02:00
Jose Fernandez
e5984ed016
scx_utils: Add log_recorder module for metrics-rs
This change adds a new module to the scx_utils crate that provides a
log recorder for metrics-rs. The log recorder will log all metrics to
the console at a configurable interval in an easy to read format. Each
metric type will be displayed in a separate section. Indentation will
be used to show the hierarchy of the metrics. This results in a more
verbose output, but it is easier to read and understand.

scx_rusty was updated to use the log recorder and all explicit metric
logging was removed.

Counters will show the total count and the rate of change per second.
Counters with an additional label, like `type` in
`dispatched_tasks_total` in rusty, will show the count, rate, and
percentage of the total count.

Counters:
  dispatched_tasks_total: 65559 [1344.8/s]
    prev_idle: 44963 (68.6%) [966.5/s]
    wsync_prev_idle: 15696 (23.9%) [317.3/s]
    direct_dispatch: 2833 (4.3%) [35.3/s]
    dsq: 1804 (2.8%) [21.3/s]
    wsync: 262 (0.4%) [4.3/s]
    direct_greedy: 1 (0.0%) [0.0/s]
    pinned: 0 (0.0%) [0.0/s]
    greedy_idle: 0 (0.0%) [0.0/s]
    greedy_xnuma: 0 (0.0%) [0.0/s]
    direct_greedy_far: 0 (0.0%) [0.0/s]
    greedy_local: 0 (0.0%) [0.0/s]
  dl_clamped_total: 1290 [20.3/s]
  dl_preset_total: 514 [1.0/s]
  kick_greedy_total: 6 [0.3/s]
  lb_data_errors_total: 0 [0.0/s]
  load_balance_total: 0 [0.0/s]
  repatriate_total: 0 [0.0/s]
  task_errors_total: 0 [0.0/s]

Gauges will show the last set value:

Gauges:
  slice_length_us: 20000.00

Histograms will show the average, min, and max. The histogram will be
reset after each log interval to avoid memory leaks, since the data
structure that holds the samples is unbounded.

Histograms:
  cpu_busy_pct: avg=1.66 min=1.16 max=2.16
  load_avg node=0: avg=0.31 min=0.23 max=0.39
  load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39
  processing_duration_us: avg=297.50 min=296.00 max=299.00

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-24 18:45:02 -06:00
David Vernet
bdbf4b9c05
topo: Return nr_cpu_ids from host Topology
In some cases, a host may have an odd topology where there are gaps in
CPU IDs (including between possible CPUs). A common pattern in
schedulers is to perform allocations for every possible CPU ID, such as
creating a per-cpu DSQ. In order to avoid confusing schedulers, let's
track the maximum CPU ID on a system so that we can return the number of
CPU IDs on the system which is inclusive of gaps.

We also update scx_rustland in this change to accommodate the fact that
we no longer export nr_cpus_possible() from TopologyMap.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-21 12:57:13 -05:00
Andrea Righi
b04e82b5eb scx_rustland_core: include buddy-alloc and refactor allocator code
The dependency of the buddy-alloc crate [1] seems to cause some troubles
with packaging, mostly because the selftests for the crate are failing
when it's compiled in release mode.

For example:

 $ cargo test --release -- --nocapture
 thread 'tests::fast_alloc::test_basic_malloc' panicked at src/tests/fast_alloc.rs:25:13:
 assertion `left == right` failed
   left: 0
  right: 42

Some of these failures with BuddyAlloc can be fixed by using a memory
arena buffer aligned to page size.

However, some test failures with FastAlloc persist that cannot be
resolved merely by aligning the pre-allocated memory arena to the page
size, as mentioned in [2].

The concern is that this may potentially lead to actual memory bugs.

Therefore, it seems safer to refactor the custom allocator code to
simply use BuddyAlloc, dropping FastAlloc completely.

To achieve this, the entire BuddyAlloc code has been directly included
in scx_rustland_core, referencing the original project and its MIT
licensing information (with the entire code still distributed under the
GPLv2 license).

Then the code has been slightly modified to remove FastAlloc and the
external dependency on the buddy-alloc crate has been dropped.

From a performance perspective this change doesn't seem to introduce any
measurable regression.

[1] https://github.com/jjyr/buddy-alloc
[2] https://github.com/jjyr/buddy-alloc/issues/16

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-19 14:44:04 +02:00
I Hsin Cheng
1334a4df5d scx_utils: Utilize Entry API for BTreeMap insertion
Take advantages of BTreeMap's Entry API working with or_insert() to do
the conditional insertion. Insert only when the entry doesn't exist.
Doing so can reduce the amount of code and provide better readability
and perform in-place manipulation.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-19 10:27:10 +08:00
I Hsin Cheng
f13697a755 scx_utils: Fix typos
Correct "Thie" to "This".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-18 00:35:43 +08:00
Tejun Heo
d18bb4831a
Merge pull request #363 from sched-ext/htejun/compat-strip
Strip compat support
2024-06-16 20:04:50 -10:00
Tejun Heo
b6ebdc635a compat: Compact min requirement checks
Let's check only the latest one.
2024-06-16 06:53:58 -10:00
Tejun Heo
6319b25cf1 scx_utils: compat.rs: Follow-up clean-ups 2024-06-16 06:45:14 -10:00
Tejun Heo
aeb805a93e compat: Drop support for missing sched_ext_ops.dump*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.dump*(). The
open helper macros now check the existence of the fields and abort if
missing.
2024-06-16 06:43:43 -10:00
Tejun Heo
4cca1e9acf compat: Drop support for missing sched_ext_ops.tick()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.tick(). The
open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:40:28 -10:00
Tejun Heo
970c04b43a compat: Drop support for missing sched_ext_ops.exit_dump_len
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.exit_dump_len.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:37:34 -10:00
Tejun Heo
046bdfd5e0 compat: Drop support for missing sched_ext_ops.hotplug_seq
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.hotplug_seq.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:34:59 -10:00
Tejun Heo
dde2942125 compat: Drop __COMPAT_scx_bpf_cpuperf_*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_*(). The open helper
macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.
2024-06-16 06:16:53 -10:00
Tejun Heo
13e8388e1e compat: Drop __COMPAT_HAS_CPUMASKS
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros
now check the existence of scx_bpf_nr_cpu_ids() and abort if not.
2024-06-16 06:12:06 -10:00
Tejun Heo
66901e2b44 compat: Drop __COMPAT_scx_bpf_dump()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_dump(). The open helper macros
now check the existence of scx_bpf_dump_bstr() and abort if not.

While at it, reorder the min requirement checks so that newly added ones are
up top to make testing easier.
2024-06-16 06:02:47 -10:00
Tejun Heo
0d8adf2260 compat: Drop __COMPAT_scx_bpf_exit()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_exit(). The open helper macros
now check the existence of scx_bpf_exit_bstr() and abort if not.
2024-06-15 20:36:17 -10:00
Andrea Righi
812aeb3b81 scx_rustland_core: fix potential race in dispatch_task()
Fix a potential race condition that might lead to a task being
dispatched without kicking the target CPU, which could result in a
potential stall.

With this applied, scx_rustland has been running without any stall for
about 18 hours on a system where the issue was previously quite easy to
reproduce.

Moreover, clarify a couple of comments in the dispatch path.

This fixes issue #353.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-16 08:31:48 +02:00
Tejun Heo
5b5e5be906 compat: Drop __COMPAT_SCX_KICK_IDLE
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros
now check the existence of SCX_KICK_IDLE and abort if not.
2024-06-15 20:24:15 -10:00
Tejun Heo
7c9aedaefe compat: Drop __COMPAT_scx_bpf_switch_all()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper
macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.
2024-06-15 20:03:37 -10:00
Tejun Heo
fb2c70de84 scx_utils: compat.rs: Helper macros shouldn't return the calling function
scx_ops_open!() and scx_ops_attach!() could return the calling function
after an error, which can be surprising. Forutnately, as all the current
callers are either unwrapping or returning on error, the surprising behavior
is currently not very noticeable.

Fix it by breaking out of the macro block on errors.
2024-06-15 18:27:39 -10:00
Andrea Righi
6cc2595922 scx_rustland_core: allow user-space scheduler to preempt other tasks
The same change was previously applied with commit 8820af8d
("scx_rustland: enable user-space scheduler to preempt other tasks").

However, it was reverted in commit 732ba490 ("scx_rustland: avoid using
SCX_ENQ_PREEMPT") with the introduction of the global dynamic time
slice (inversely proportional to the number of running tasks), because
it already provided sufficient context switch opportunities, negating
any advantage of the preemption.

With the introduction of the virtual time slice in commit 6f4cd853
("scx_rustland: introduce virtual time slice"), re-introducing the
ability for the user-space scheduler to preempt other tasks now appears
beneficial again.

As confirmed by experimental results, this change helps prevent
potential audio cracking issues and enhances overall system
responsiveness, resulting in a 4-5% increase in fps performing the
usual benchmark of gaming while recompiling the kernel.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-14 20:09:19 +02:00
Andrea Righi
2d2c6083d7 scx_rustland_core: use SCX_DSQ_LOCAL only with kthreads
Tasks dispatched directly using SCX_DSQ_LOCAL may receive excessively
high priority compared to those dispatched by the user-space scheduler.

To avoid this priority disparity, dispatch tasks to the per-CPU DSQs
using direct dispatch and reserve SCX_DSQ_LOCAL for per-CPU kthreads
only.

Tested-by: Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-14 20:09:19 +02:00
Daniel Hodges
2ca42428cd scx_utils: Add CPU freq transition latency
This change adds the CPU frequency transition latency from the
`cpuinfo_transition_latency` from sysfs. The value of this field is
described [cpufreq
docs](https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt).
On supported systems it returns the CPU frequency transition latency in
nanoseconds. The goal of this change is so that in the future schedulers
can use this data to make better frequency scaling decisions.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-06-11 07:35:34 -07:00
Tejun Heo
3e3720fc7f scx_utils: Add compat support for ops.tick() and ops.dump*()
Match rust scx_ops_load!()'s compat support with C's SCX_OPS_LOAD().
2024-06-06 14:16:36 -10:00
Tejun Heo
200af60f2a scx_layered: Fix load failure due to scheduler_tick() -> sched_tick() rename
- scx_utils: Replace kfunc_exists() with ksym_exists() which doesn't care
  about the type of the symbol.

- scx_layered: Fix load failure on kernels >= v6.10-rc due to
  scheduler_tick() -> sched_tick rename. Attach the tick fentry function to
  either scheduler_tick() or sched_tick().
2024-06-06 12:54:59 -10:00
Andrea Righi
40e67897a9
Merge pull request #331 from sched-ext/rustland-core-dispatch-debug
scx_rustland_core: add extra debugging info to dispatch_task()
2024-06-04 18:58:01 +02:00
Andrea Righi
b363a13310 scx_rustland_core: add enqueue flags to debug info
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-04 17:14:38 +02:00
Andrea Righi
89384754ce scx_rustland_core: add task time slice to the debug info
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-04 17:14:38 +02:00
Tejun Heo
e556dd375d scx: Unify loading and running boilerplate across rust schedulers
Make restart handling with user_exit_info simpler and consistently use the
load and report macros consistently across the rust schedulers. This makes
all schedulers automatically handle auto restarts from CPU hotplug events.
Note that this is necessary even for scx_lavd which has CPU hotplug
operations as CPU hotplug operations which took place between skel open and
scheduler init can still trigger restart.
2024-06-03 12:25:41 -10:00
Tejun Heo
a2d5310cb6 Bump versions for a release 2024-06-03 08:35:21 -10:00
Andrea Righi
4503c7080a scx_rustland_core: fix build error with musl
As reported in #319, we may get a build failure in presence of musl,
that requires additional parameters in sched_param.

Fix by adding a proper conditional to support both gnu libc and musl
libc.

This fixes #319.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-26 22:30:04 +02:00
Yiwei Lin
8c2236770d Reduce MAX_ENQUEUED_TASKS to fit percpu allocator
The setting of ops->dispatch_max_batch leads to a too large allocated
size for percpu allocator, and it will be unhappy if we want a size
larger than PCPU_MIN_UNIT_SIZE. Reduce MAX_ENQUEUED_TASKS for fix.
2024-05-23 22:33:02 +08:00
Andrea Righi
4791d862f5 scx_rustland_core: second chance CPU migration
Implement a second-chance migration in select_cpu(): after a task has
been dispatched directly do not try to migrate it immediately on a
different CPU, but force it to stay on prev_cpu for another round.

This seems quite effective on certain architectures (such as on a system
with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide
noticeable benefits with gaming or WebGL applications (such as
https://webglsamples.org/aquarium/aquarium.html) under regular workload
conditions (around +5% fps).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 21:38:34 +02:00
Andrea Righi
6c129e9b4b scx_rustland_core: consume from the shared DSQ before local DSQ
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 12:12:55 +02:00
Andrea Righi
9e4bea4a1c scx_rustland_core: switch to FIFO when system is underutilized
Provide a knob in scx_rustland_core to automatically turn the scheduler
into a simple FIFO when the system is underutilized.

This choice is based on the assumption that, in the case of system
underutilization (less tasks running than the amount of available CPUs),
the best scheduling policy is FIFO.

With this option enabled the scheduler starts in FIFO mode. If most of
the CPUs are busy (nr_running >= num_cpus - 1), the scheduler
immediately exits from FIFO mode and starts to apply the logic
implemented by the user-space component. Then the scheduler can switch
back to FIFO if there are no tasks waiting to be scheduled (evaluated
using a moving average).

This option can be enabled/disabled by the user-space scheduler using
the fifo_sched parameter in BpfScheduler: if set, the BPF component will
periodically check for system utilization and switch back and forth to
FIFO mode based on that.

This allows to improve performance of workloads that are using a small
amount of the available CPUs in the system, while still maintaining the
same good level of performance for interactive tasks when the system is
over commissioned.

In certain video games, such as Baldur's Gate 3 or Counter-Strike 2,
running in "normal" system conditions, we can experience a boost in fps
of approximately 4-8% with this change applied.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:02:02 +02:00
Andrea Righi
912bde6a52 scx_rustland_core: simplify CPU selection logic
Simplify the CPU idle selection logic relying on the built-in logic.

If something can be improved in this logic it should be done in the
backend, changing the default idle selection logic, rustland doesn't
need to do anything special here for now.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:02:02 +02:00
Andrea Righi
0d75c80587 Revert "Merge pull request #305 from sched-ext/rustland-fifo-mode"
This merge included additional commits that were supposed to be included
in a separate pull request and have nothing to do with the fifo-mode
changes.

Therefore, revert the whole pull request and create a separate one with
the correct list of commits required to implement this feature.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:00:25 +02:00
Andrea Righi
f27e67dfb8 scx_rustland_core: second chance CPU migration
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 18:08:58 +02:00
Andrea Righi
778ee1406f scx_rustland_core: consume from the shared DSQ before local DSQ
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 18:08:19 +02:00