scx/scheds/rust/scx_rusty
Jose Fernandez e5984ed016
scx_utils: Add log_recorder module for metrics-rs
This change adds a new module to the scx_utils crate that provides a
log recorder for metrics-rs. The log recorder will log all metrics to
the console at a configurable interval in an easy to read format. Each
metric type will be displayed in a separate section. Indentation will
be used to show the hierarchy of the metrics. This results in a more
verbose output, but it is easier to read and understand.

scx_rusty was updated to use the log recorder and all explicit metric
logging was removed.

Counters will show the total count and the rate of change per second.
Counters with an additional label, like `type` in
`dispatched_tasks_total` in rusty, will show the count, rate, and
percentage of the total count.

Counters:
  dispatched_tasks_total: 65559 [1344.8/s]
    prev_idle: 44963 (68.6%) [966.5/s]
    wsync_prev_idle: 15696 (23.9%) [317.3/s]
    direct_dispatch: 2833 (4.3%) [35.3/s]
    dsq: 1804 (2.8%) [21.3/s]
    wsync: 262 (0.4%) [4.3/s]
    direct_greedy: 1 (0.0%) [0.0/s]
    pinned: 0 (0.0%) [0.0/s]
    greedy_idle: 0 (0.0%) [0.0/s]
    greedy_xnuma: 0 (0.0%) [0.0/s]
    direct_greedy_far: 0 (0.0%) [0.0/s]
    greedy_local: 0 (0.0%) [0.0/s]
  dl_clamped_total: 1290 [20.3/s]
  dl_preset_total: 514 [1.0/s]
  kick_greedy_total: 6 [0.3/s]
  lb_data_errors_total: 0 [0.0/s]
  load_balance_total: 0 [0.0/s]
  repatriate_total: 0 [0.0/s]
  task_errors_total: 0 [0.0/s]

Gauges will show the last set value:

Gauges:
  slice_length_us: 20000.00

Histograms will show the average, min, and max. The histogram will be
reset after each log interval to avoid memory leaks, since the data
structure that holds the samples is unbounded.

Histograms:
  cpu_busy_pct: avg=1.66 min=1.16 max=2.16
  load_avg node=0: avg=0.31 min=0.23 max=0.39
  load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39
  processing_duration_us: avg=297.50 min=296.00 max=299.00

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-24 18:45:02 -06:00
..
src scx_utils: Add log_recorder module for metrics-rs 2024-06-24 18:45:02 -06:00
.gitignore Restructure scheds folder names 2023-12-17 13:14:31 -08:00
build.rs Restructure scheds folder names 2023-12-17 13:14:31 -08:00
Cargo.toml rusty: Integrate stats with the metrics framework 2024-06-21 10:18:44 -06:00
LICENSE Restructure scheds folder names 2023-12-17 13:14:31 -08:00
meson.build scheds-rust: build rust schedulers in sequence 2024-04-23 08:06:27 +08:00
README.md Add README files for each rust scheduler 2024-01-04 07:35:44 -08:00
rustfmt.toml Restructure scheds folder names 2023-12-17 13:14:31 -08:00

scx_rusty

This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.

Overview

A multi-domain, BPF / user space hybrid scheduler. The BPF portion of the scheduler does a simple round robin in each domain, and the user space portion (written in Rust) calculates the load factor of each domain, and informs BPF of how tasks should be load balanced accordingly.

How To Install

Available as a Rust crate: cargo add scx_rusty

Typical Use Case

Rusty is designed to be flexible, and accommodate different architectures and workloads. Various load balancing thresholds (e.g. greediness, frequenty, etc), as well as how Rusty should partition the system into scheduling domains, can be tuned to achieve the optimal configuration for any given system or workload.

Production Ready?

Yes. If tuned correctly, rusty should be performant across various CPU architectures and workloads. Rusty by default creates a separate scheduling domain per-LLC, so its default configuration may be performant as well. Note however that scx_rusty does not yet disambiguate between LLCs in different NUMA nodes, so it may perform better on multi-CCX machines where all the LLCs share the same socket, as opposed to multi-socket machines.

Note as well that you may run into an issue with infeasible weights, where a task with a very high weight may cause the scheduler to incorrectly leave cores idle because it thinks they're necessary to accommodate the compute for a single task. This can also happen in CFS, and should soon be addressed for scx_rusty.