Commit Graph

799 Commits

Author SHA1 Message Date
Tejun Heo
9e3b4e6db0 scx_stats: A bit of cleanups and renames 2024-08-23 09:09:02 -10:00
Tejun Heo
b6ccb87bec
Merge pull request #539 from sched-ext/htejun/scx_rusty
scx_rusty: Convert to scx_stats
2024-08-23 08:42:47 -10:00
Daniel Hodges
7d45059fa9
Merge pull request #538 from hodgesds/layered-pid
scx_layered: Add pid/ppid matches
2024-08-23 14:08:40 -04:00
Tejun Heo
8c8912ccea Merge branch 'main' into htejun/scx_rusty 2024-08-23 07:50:23 -10:00
Tejun Heo
44a0f1b124 scx_utils: Factor out monitor_stats() from scx_rusty and scx_layered 2024-08-23 06:46:19 -10:00
Tejun Heo
ae3024e938 scx_layered: Add --stats and make --monitor behavior consistent with scx_rusty 2024-08-23 05:52:52 -10:00
Tejun Heo
0f04a93dd1 scx_rusty: Add stat descriptions and make minor adjustments 2024-08-23 05:46:13 -10:00
Tejun Heo
36865234f8 scx_rusty: Add scx_stats annotations necessary for openmetrics translation 2024-08-23 04:59:08 -10:00
Tejun Heo
2f3f473cd3 scx_rusty: Improve timestamp reporting 2024-08-23 04:31:27 -10:00
Daniel Hodges
11b978a892 scx_layered: Add pid/ppid matches
Add matches for pid/ppid.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-23 07:20:05 -07:00
Tejun Heo
76934f3aab scx_rusty: Convert to scx_stats
This allows scx_rusty to avoid generating excessive logs for statistics
while still allowing detailed monitoring on demand.
2024-08-22 19:44:12 -10:00
Tejun Heo
16c07a5cd9 scx_rusty: Don't reset bpf_stats, remember prev states and calculate delta
This will ease transition to scx_stats.
2024-08-22 13:02:23 -10:00
Tejun Heo
13fa48a871 scx_rusty: Separate out stats generation and formatting
to prepare for scx_stats conversion.
2024-08-22 10:03:10 -10:00
Tejun Heo
b4564520e5 scx_rusty: Simplify Stats structs and take id out of the structs
to prepare for scx_stats conversion. While at it, make some cosmetic
changes.
2024-08-22 08:45:33 -10:00
Andrea Righi
6a2285398d scx_bpfland: introduce --lowlatency option
Introduce the new `--lowlatency` option, which enables switching between
the default pure vruntime-based scheduling (more optimized for server
workloads) and a deadline-based scheduling (better suited for
low-latency workloads).

When the low-latency mode is activated, a task's deadline is calculated
as its vruntime, adjusted by a bonus proportional to the task's average
number of voluntary context switches (the more voluntary context
switches, the shorter the deadline).

This feature enhances the prioritization of interactive tasks even more,
proportionally to their average voluntary context switches, also within
the two main global queues (priority / shared) and it helps to maintain
interactive workloads always responsive, even in presence of heavy
non-interactive background work.

Low-latency mode allows to prevent audio cracking even in presence of a
large amount of short-lived tasks with pseudo-interactive behavior (i.e,
hackbench) and it enables achieving approximately a +33% average
frames-per-second (FPS) in the typical "gaming while building the
kernel" benchmark.

However, it can also amplify the de-prioritization of CPU-intensive
tasks, making this option more suitable for specific low-latency
scenarios. Therefore the low-latency mode is disabled by default and it
can only be enabled via the `--lowlatency` option.

Tested-by: Piotr Gorski (piotrgorski@cachyos.org)
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-22 13:26:19 +02:00
Tejun Heo
4834dec684 scx_rusty: Move stats structs to stats.rs and rename for consistency 2024-08-21 22:04:38 -10:00
Andrea Righi
b0a8e4a91e scx_bpfland: better time slice control
Explicitly replenish the task's time slice from ops.dispatch() if the
task still wants to run and no other task is selected. In this way the
sched_ext core won't automatically re-schedule the task on the same CPU,
implicitly assigning a time slice of SCX_SLICE_DFL.

Moreover, instead of determining the task time slice in ops.enqueue(),
refresh the time slice immediately before the task is started on its
assigned CPU in ops.running().

This allows to use a more precise time slice, adjusted based on the
actual amount of tasks that are currently waiting to be scheduled.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-22 09:23:37 +02:00
Tejun Heo
d6ac5fbd9c scx_layered: Drop SCX_OPS_ENQ_LAST
The meaning of SCX_OPS_ENQ_LAST will change with future kernel updates and
enqueueing on local DSQ will no longer be sufficient to avoid stalls. No
reason to do it anyway. Just drop it.
2024-08-21 13:13:59 -10:00
Tejun Heo
f726f0b73b Version: Cargo.lock 2024-08-21 06:45:19 -10:00
Tejun Heo
4d1f0639d8 Version: v1.0.3 2024-08-21 06:42:11 -10:00
Andrea Righi
fedfee0bd6 scx_bpfland: drop unused variable
With the global scx_utils::NR_CPU_IDS we don't need Topology anymore in
init_primary_domain(), so drop the variable to fix the following build
warning:

warning: unused variable: `topo`
   --> src/main.rs:385:9
    |
385 |         topo: &Topology,
    |         ^^^^ help: if this is intentional, prefix it with an underscore: `_topo`
    |
    = note: `#[warn(unused_variables)]` on by default

Fixes: 1da249f ("scx_utils::topology: Always use NR_CPU_IDS and NR_CPUS_POSSIBLE")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 17:46:12 +02:00
Andrea Righi
9f7a11bba6
Merge pull request #528 from sched-ext/bpfland-turbo-boost
scx_bpfland: properly classify Intel Turbo Boost CPUs
2024-08-21 17:40:25 +02:00
Daniel Hodges
f2a6661a85
Merge pull request #524 from hodgesds/layered-core-fixes
scx_layered: Fix core selection
2024-08-21 08:13:33 -04:00
Tejun Heo
9c62019c81
Merge pull request #527 from sched-ext/htejun/scx_utils
scx_utils::cpumask,topology: Misc updates
2024-08-20 22:25:25 -10:00
Andrea Righi
695e3b25b0 scx_bpfland: classify CPUs depending of their the base frequency
Use the base frequency, instead of maximum frequency, to classify fast
and slow CPUs. This ensures accurate distinction between Intel Turbo
Boost CPUs and genuinely faster CPUs when auto-detecting the primary
scheduling domain.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 10:16:41 +02:00
Andrea Righi
e0fb99835d
Merge pull request #525 from sched-ext/bpfland-disable-interactive
scx_bpfland: allow to completely disable interactive classification
2024-08-21 10:02:43 +02:00
Tejun Heo
5cf4212330 Revert "rusty: Integrate stats with the metrics framework"
This reverts commit 83373b1f4e in prepration
for converting to scx_stats.
2024-08-20 21:59:25 -10:00
Tejun Heo
516a7590db scx_rusty: Revert log_recorder conversion
scx_rusty will be converted to scx_stats in a similar fashin with
scx_layered. Undo log_recorder conversion in preparation.
2024-08-20 21:59:20 -10:00
Tejun Heo
1da249f063 scx_utils::topology: Always use NR_CPU_IDS and NR_CPUS_POSSIBLE
Always use the LazyLock versions and drop the counterparts from Topology.
2024-08-20 21:57:56 -10:00
Tejun Heo
092f5422d6
Merge pull request #518 from sched-ext/htejun/misc
scx_layered: Add `--run-example` and enable CI testing
2024-08-20 21:42:45 -10:00
Tejun Heo
f7c193e528 scx_utils, scx_rusty: Minor updates to version handling
- Update scx_utils/build.rs so that 12 char SHA1 is generated instead of
  full one.

- Add --version to scx_rusty. Use custom one as we don't want to use the
  default cargo version one.
2024-08-20 21:03:05 -10:00
Tejun Heo
8f786be08f scx_rusty: cargo fmt 2024-08-20 21:03:05 -10:00
Tejun Heo
4440567949 scx_rusty: Update Cargo.lock 2024-08-20 21:03:05 -10:00
Andrea Righi
c85315d527 scx_bpfland: allow to completely disable interactive classification
Tasks enqueued with SCX_ENQ_WAKEUP are immediately classified as
interactive. However, if interactive tasks classification is disabled
(via `-c 0`), we should avoid promoting them as interactive.

This is particularly important because, with the nvcsw logic disabled,
tasks can remain classified as interactive indefinitely and they will
never be demoted to regular tasks.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 08:45:13 +02:00
Andrea Righi
a9f5aaa536 scx_bpfland: replace custom CpuMask with scx_utils::Cpumask
Rely on scx_utils::Cpumask instead of re-implementing a custom struct to
parse and manage CPU masks.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 07:21:52 +02:00
Daniel Hodges
4d1c932619 scx_layered: Fix core selection
Fix a bug introduced in #510 where it assumed core ids are incremental.
This refactors the core ordering for layers to be far more simple and
provide some space for layer core isolation in low utilization.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-20 19:26:53 -07:00
Andrea Righi
33b6ada98e
Merge pull request #509 from sched-ext/bpfland-topology
scx_bpfland: topology awareness
2024-08-20 14:37:23 +02:00
Andrea Righi
467d4b5ea4 scx_bpfland: get topology information from scx_utils::Topology
Rely on scx_utils::Topology to get CPU and cache information, instead of
re-implementing custom methods.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-20 10:16:02 +02:00
Tejun Heo
c0418250f4 scx_layered: Add --run-example option
So that scx_layered can be run in CI environment in a single command.
2024-08-19 20:50:10 -10:00
Changwoo Min
41bc6f0967
Merge pull request #511 from multics69/lavd-perf-profile
scx_lavd: add power profile options: --performance, --balanced, --powersave
2024-08-20 09:02:37 +09:00
Changwoo Min
1d61dd4c1d
Merge pull request #508 from multics69/lavd-numa-fix
scx_lavd: fix a potential watchdog timeout error at multi-NUMA/CCX platforms
2024-08-20 09:02:23 +09:00
Changwoo Min
2c4c2a0ccf
Merge pull request #507 from multics69/lavd-pretty-rust
scx_lavd: revise FlatTopology prettier
2024-08-20 09:01:26 +09:00
Daniel Hodges
05a2721f8e
Merge pull request #510 from hodgesds/layered-core-topo-selection
scx_layered: Use topology for core selection
2024-08-19 20:01:16 -04:00
Tejun Heo
d01b49bd0e scx_layered: Fix verification failure
4fccc06905 ("scx_layered: Fix uninitialized variable") causes the
following verification failure. Fix it by moving assignments below range
checking.

  Validating match_layer() func#1...
  283: R1=scalar() R2=scalar() R3=mem_or_null(id=49,sz=1) R10=fp0
  ; int match_layer(u32 layer_id, pid_t pid, const char *cgrp_path) @ main.bpf.c:1029
  283: (7b) *(u64 *)(r10 -24) = r3      ; R3=mem_or_null(id=49,sz=1) R10=fp0 fp-24_w=mem_or_null(id=49,sz=1)
  284: (bc) w7 = w1                     ; R1=scalar() R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  ; struct layer *layer = &layers[layer_id]; @ main.bpf.c:1033
  285: (bc) w1 = w7                     ; R1_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  286: (27) r1 *= 1061192               ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  287: (18) r8 = 0xffffc90002a26000     ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080)
  289: (0f) r8 += r1                    ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  ; u32 nr_match_ors = layer->nr_match_ors; @ main.bpf.c:1034
  290: (bf) r1 = r8                     ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  291: (07) r1 += 1060992               ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,off=0x103080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  292: (61) r1 = *(u32 *)(r1 +0)
  R1 unbounded memory access, make sure to bounds check any such access
  processed 1099 insns (limit 1000000) max_states_per_insn 2 total_states 72 peak_states 72 mark_read 9
  -- END PROG LOAD LOG --
2024-08-19 13:18:20 -10:00
Daniel Hodges
b3793e0069 scx_layered: Use topology for core selection
Currently the core selection logic in scx_layered uses the first
available core in the bitmask. This is suboptimal when the scheduler is
configured with specific NUMA/LLC restrictions. The ideal core selection
logic should try to find the least used cores within the preferred
scheduling domain and allocate new cpus from shared cores within that
domain.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 15:51:35 -07:00
Tejun Heo
3498a2b899
Merge pull request #514 from sched-ext/htejun/scx_stats
scx_stats, scx_layered: Implement independent stats client sessions
2024-08-19 11:24:53 -10:00
Tejun Heo
f6bc52d31e scx_layered: Make --monitor behavior more useful
- If --monitor is specified with layer specs, the scheduler also starts
  stats monitoring on a thread.

- Standalone monitoring mode no longer exits when the scheduler isn't there.
2024-08-19 10:55:02 -10:00
Tejun Heo
d03e48eb75 scx_layered: Implement per-stats-client nr_layer_cpus_ranges tracking
With this, every client sees the correct nr_layer_cpus_ranges without
interfering with each other.
2024-08-19 09:12:51 -10:00
Tejun Heo
448aacfd60 scx_layered: Initialize Stats.prev_layer_cycles properly on new()
So that new stats session doesn't start with an inflated utilization number.
2024-08-19 08:40:40 -10:00
Tejun Heo
25d7e6f787 scx_layered: Implement on-demand statistics generation
Instead of keeping one copy of sched_stats, each stats server session
carries their own so that stats can be generated independently by each
client at any interval. CPU allocation min/max tracking is broken for now.
2024-08-19 08:27:36 -10:00