scx/scheds/rust/scx_layered
Tejun Heo d01b49bd0e scx_layered: Fix verification failure
4fccc06905 ("scx_layered: Fix uninitialized variable") causes the
following verification failure. Fix it by moving assignments below range
checking.

  Validating match_layer() func#1...
  283: R1=scalar() R2=scalar() R3=mem_or_null(id=49,sz=1) R10=fp0
  ; int match_layer(u32 layer_id, pid_t pid, const char *cgrp_path) @ main.bpf.c:1029
  283: (7b) *(u64 *)(r10 -24) = r3      ; R3=mem_or_null(id=49,sz=1) R10=fp0 fp-24_w=mem_or_null(id=49,sz=1)
  284: (bc) w7 = w1                     ; R1=scalar() R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  ; struct layer *layer = &layers[layer_id]; @ main.bpf.c:1033
  285: (bc) w1 = w7                     ; R1_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  286: (27) r1 *= 1061192               ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  287: (18) r8 = 0xffffc90002a26000     ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080)
  289: (0f) r8 += r1                    ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  ; u32 nr_match_ors = layer->nr_match_ors; @ main.bpf.c:1034
  290: (bf) r1 = r8                     ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  291: (07) r1 += 1060992               ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,off=0x103080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  292: (61) r1 = *(u32 *)(r1 +0)
  R1 unbounded memory access, make sure to bounds check any such access
  processed 1099 insns (limit 1000000) max_states_per_insn 2 total_states 72 peak_states 72 mark_read 9
  -- END PROG LOAD LOG --
2024-08-19 13:18:20 -10:00
..
src scx_layered: Fix verification failure 2024-08-19 13:18:20 -10:00
.gitignore scheds/rust: Include Cargo.lock in the repo 2024-08-15 23:08:35 -10:00
build.rs Restructure scheds folder names 2023-12-17 13:14:31 -08:00
Cargo.lock scx_layered: Implement on-demand statistics generation 2024-08-19 08:27:36 -10:00
Cargo.toml scx_layered: Implement on-demand statistics generation 2024-08-19 08:27:36 -10:00
LICENSE Restructure scheds folder names 2023-12-17 13:14:31 -08:00
meson.build meson: introduce serialize build option 2024-06-28 10:17:37 +02:00
README.md Add README files for each rust scheduler 2024-01-04 07:35:44 -08:00
rustfmt.toml Restructure scheds folder names 2023-12-17 13:14:31 -08:00

scx_layered

This is a single user-defined scheduler used within sched_ext, which is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. Read more about sched_ext.

Overview

A highly configurable multi-layer BPF / user space hybrid scheduler.

scx_layered allows the user to classify tasks into multiple layers, and apply different scheduling policies to those layers. For example, a layer could be created of all tasks that are part of the user.slice cgroup slice, and a policy could be specified that ensures that the layer is given at least 80% CPU utilization for some subset of CPUs on the system.

How To Install

Available as a Rust crate: cargo add scx_layered

Typical Use Case

scx_layered is designed to be highly customizable, and can be targeted for specific applications. For example, if you had a high-priority service that required priority access to all but 1 physical core to ensure acceptable p99 latencies, you could specify that the service would get priority access to all but 1 core on the system. If that service ends up not utilizing all of those cores, they could be used by other layers until they're needed.

Production Ready?

Yes. If tuned correctly, scx_layered should be performant across various CPU architectures and workloads.

That said, you may run into an issue with infeasible weights, where a task with a very high weight may cause the scheduler to incorrectly leave cores idle because it thinks they're necessary to accommodate the compute for a single task. This can also happen in CFS, and should soon be addressed for scx_layered.