Commit Graph

1770 Commits

Author SHA1 Message Date
Andrea Righi
f782467eaf scx_rustland: convert to scx_stats
This allows scx_rustland to avoid generating excessive logs for
statistics while still allowing detailed monitoring on demand.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-30 18:32:32 +02:00
Changwoo Min
9091dd983b scx_lavd: add "--autopilot" mode
Add "--autopilot" option and mode. In the autopilot mode, the scheduler
dynamically changes its power mode according to system's load (cpu
utilization). When the cpu utilization is low enough (say <=5%), it
switches to the powersave mode since there is nothing to process fast so
powersaving is the primary goal. When the utilization is moderate (say
>5%, <=30%), it runs in balanced mode. When the utilization is high
enough (say >30%), it runs in performance mode.

Note that it only changes scheduler's power mode but it does not change
system's energy profile.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-31 01:14:33 +09:00
Changwoo Min
5ecaa9ebe2 scx_lavd: improve the accuracy of cpu utilization calculation
When a cpu is idle for a whole interval, its idle time does not
correctlyh adds up so the utilization of such cpu tends to be higher
than the actual utilization. Now it is fixedk, so cpu utilization
becomes more accurate.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-31 01:14:33 +09:00
Changwoo Min
2f8cc0d60f scx_lavd: rename the "--auto" opetion to "--autopower" to be clear
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-31 01:14:33 +09:00
Changwoo Min
815f1263b2 scx_lavd: reinitialize active cpumask when power mode changes
When the power mode changes back to performance mode, we should
active/overflow cpumask to its initial state -- all big cores are in
active cpumask and all little cores are in overflow cpumask. Otherwise,
the active/overflow cpumasks will be used in the perfformance mode.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-31 01:14:33 +09:00
Changwoo Min
afb8c78a09 scx_lavd: print power mode change in the auto mode
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-31 01:14:33 +09:00
Changwoo Min
a89a56dba4 scx_lavd: add a fastpath in ops.select_cpu() for a sharply pinned task
If a task can be run only on a single cpu, we don't need to go through
all the steps in ops.select_cpu(). Instread, we simply check if a task
is still pinned on the prev_cpu and go.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-31 01:14:33 +09:00
Andrea Righi
b54fc202b8
Merge pull request #583 from sched-ext/bpfland-fix-pcpu-direct-dispatch
scx_bpfland: always rely on prev_cpu with single-CPU tasks
2024-08-30 18:12:59 +02:00
Andrea Righi
7cc18460b9 scx_bpfland: always rely on prev_cpu with single-CPU tasks
When selecting an idle for tasks that can only run on a single CPU,
always check if the previously used CPU is sill usable, instead of
trying to figure out the single allowed CPU looking at the task's
cpumask.

Apparently, single-CPU tasks can report a prev_cpu that is not in the
allowed cpumask when they rapidly change affinity.

This could lead to stalls, because we may end up dispatching the kthread
to a per-CPU DSQ that is not compatible with its allowed cpumask.

Example:

kworker/u32:2[173797] triggered exit kind 1026:
  runnable task stall (kworker/2:1[70] failed to run for 7.552s)
...
  R kworker/2:1[70] -7552ms
      scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=0x8 dsq_vtime=234483011369
      cpus=04

In this case kworker/2 can only run on CPU #2 (cpus=0x4), but it's
dispatched to dsq_id=0x8, that can only be consumed by CPU 8 => stall.

To prevent this, do not try to figure out the best idle CPU for tasks
that are changing affinity and just dispatch them to a global DSQ
(either priority or regular, depending on its interactive state).

Moreover, introduce an explicit error check in dispatch_direct_cpu() to
improve detection of similar issues in the future, and drop
lookup_task_ctx() in favor of try_lookup_task_ctx(), since we can now
safely handle all the cases where the task context is not found.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-30 09:45:58 +02:00
Changwoo Min
3e2e78a9ec
Merge pull request #584 from multics69/lavd-turbo2
scx_lavd: automatically determine power mode and more
2024-08-30 08:56:16 +09:00
Daniel Hodges
47184e9d19
Merge pull request #582 from hodgesds/layered-growth-interface
scx_layered: Add layer growth config
2024-08-29 18:49:59 -04:00
Vladislav Nepogodin
4d770e1f84
scx_loader: Add scheduler loader via system DBUS interface 2024-08-30 00:56:27 +04:00
Daniel Hodges
36e81f81d7
Merge pull request #585 from hodgesds/topo-node-cores
scx_utils: Add cores helper to node topology
2024-08-29 14:07:38 -04:00
Daniel Hodges
0c8dc43239
Merge pull request #586 from hodgesds/versitat-fixes
ci: Remove veristat diff workflow
2024-08-29 14:06:39 -04:00
Daniel Hodges
a3ae127194 ci: Remove veristat diff workflow
The veristat diff action isn't working properly so remove it for now.
Instead, run veristat (non diff mode) as part of the build-scheds
workflow. This should still provide some useful data for BPF program
related stats.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-29 05:50:06 -07:00
Daniel Hodges
f0c9a3932d scx_utils: Add cores helper to node topology
Add a helper for getting the cores per node.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-29 05:21:56 -07:00
Changwoo Min
bb08919203 scx_lavd: determine power mode automatically with --auto option
It checkes the EPP (energy performance preference) peirodically and sets
the power profile of the scheduler during runtiime as a user changes its
EPP profile (from her desktop UI).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-29 19:15:23 +09:00
Andrea Righi
cc3f696c4b
Merge pull request #577 from sched-ext/bpfland-task-affinity
scx_bpfland: enhanced task affinity
2024-08-29 07:46:57 +02:00
Daniel Hodges
b4c0eda163
Merge pull request #581 from hodgesds/slice-docs
scx_layered: Update docs for layer slice setting
2024-08-28 23:32:52 -04:00
Daniel Hodges
7e0329e45c scx_layered: Add layer growth config
Add a per layer config for different implementations of layer growth
algorithms. Convert the existing default logic into a default layer
growth algorithm and add a linear implementation.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 19:17:24 -07:00
Daniel Hodges
cf765562c7
scx_layered: Update docs for layer slice setting
Add docs for layer slice setting.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 22:12:07 -04:00
Daniel Hodges
7166f0085f
Merge pull request #580 from hodgesds/layered-docs
Layered docs
2024-08-28 16:01:05 -04:00
Daniel Hodges
a23308e7b0 scx_layered: Add more docs on tuning
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 12:38:05 -07:00
Daniel Hodges
96326b1ef3 scx_layered: Add additional docs
Add some additional docs on tuning layered.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 12:27:26 -07:00
Daniel Hodges
f2ddae6773
Merge pull request #578 from hodgesds/layered-timeslice
scx_layered: Add per layer timeslice
2024-08-28 15:13:02 -04:00
Daniel Hodges
cc450f1a4b scx_layered: Add per layer timeslice
Allow setting a different timeslice per layer.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 11:21:03 -07:00
Daniel Hodges
3219c6dd22
Merge pull request #579 from hodgesds/layered-verifier-hacks
scx_layered: Make verification easier on older kernels
2024-08-28 12:34:09 -04:00
Daniel Hodges
c511b42b7b scx_layered: Make verification easier on older kernels
Refactor some BPF code to make verification easier on older kernels.
This is to make it easier to maintain backports.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 08:05:10 -07:00
Daniel Hodges
5391816853
Merge pull request #575 from hodgesds/gpu-topo
scx_utils: Add GPU topology
2024-08-28 09:59:44 -04:00
Daniel Hodges
12f8cb74b5 scx_utils: Add GPU topology
Add GPU awareness to the topology crate.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-28 06:35:35 -07:00
Andrea Righi
28cb1ec5cb scx_bpfland: enhanced task affinity
Aggressively try to keep tasks running on the same CPU / cache / domain,
to achieve higher performance when the system is not over commissioned.

This is done by giving a second chance in ops.enqueue(), in addition to
ops.select_cpu(), to find an idle CPU close to the previously used CPU.

Moreover, even if the task is dispatched to the global DSQs, always try
to check if there is an idle CPU in the primary domain that can
immediately consume the task.

= Results =

This change seems to provide a minor, but consistent, boost of
performance with the CPU-intensive benchmarks from the CachyOS
benchmarks selection [1].

Similar results can also be noticed with some WebGL benchmarks [2], when
system usage is close to its maximum capacity.

Test:
 - cachyos-benchmarker

System:
 - AMD Ryzen 7 5800X 8-Core Processor

Metrics:
 - total time: elapsed time of all benchmarks
 - total score: geometric mean of all benchmarks

NOTE: total time is the most relevant, since it gives a measure of the
aggregate performance, while the total score emphasizes more on
performance consistency across all benchmarks.

== Results: summary ==

 +-------------------------+---------------------+---------------------+
 |         Scheduler       |    Total Time       |    Total Score      |
 |                         |    (less = better)  |    (less = better)  |
 +-------------------------+---------------------+---------------------+
 |                 EEVDF   |  624.44 sec         |      123.68         |
 |               bpfland   |  625.34 sec         |      122.21         |
 | bpfland-task-affinity   |  623.67 sec         |      122.27         |
 +-------------------------+---------------------+---------------------+

== Conclusion ==

With this patch applied, bpfland shows both a better performance and
consistency. Although the gains are small (less than 1%), they are still
significant for this type of benchmark and consistently appear across
multiple runs.

[1] https://github.com/CachyOS/cachyos-benchmarker
[2] https://webglsamples.org/aquarium/aquarium.html

Tested-by: Piotr Gorski < piotr.gorski@cachyos.org >
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 10:30:54 +02:00
Changwoo Min
d708939e5a
Merge pull request #576 from anh0516/main
scx_lavd: Add power mode clarification to --no-prefer-turbo-core
2024-08-28 14:03:46 +09:00
Avraham Hollander
6c5d85401d
Merge branch 'sched-ext:main' into main 2024-08-27 23:07:54 -04:00
Avraham Hollander
2a3cbeb760 scx_lavd: Add same power mode clarification to --no-prefer-turbo-core 2024-08-27 23:06:31 -04:00
Changwoo Min
5588126cff scx_lavd: minior optimization for consume_task()
When iterating neighbors, the existing code unnecessarily iterates all
the neighbors to the maximum even if there is no neighors. So the fix
escapes early when there is no neighbors.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-28 10:26:50 +09:00
Changwoo Min
95272ae910 scx_lavd: proper handling of ctrl-c in a monitoring mode
Ctrl-c wasn't properly handled in the monitoring mode
(`--monitor-sched-samples`), so the scheduler could not be terminated by
pressing ctrl-c. The missing ctrl-c handling is added to the monitor
thread.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-28 10:05:34 +09:00
Changwoo Min
9c4428fd8b scx_lavd: remove unused rust functions
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-28 10:02:11 +09:00
Andrea Righi
1421094f32
Merge pull request #574 from sched-ext/scx-utils-turbo-core-type
scx_bpfland: rely on Topology to classify CPU types
2024-08-28 00:52:44 +02:00
Andrea Righi
a155d5185d scx_bpfland: rely on Topology to classify core types
Rely on scx_utils::Topology to classify Big, Little and Turbo CPUs.

Moreover, support the special keyword "all" with --primary-domain to
include all the CPUs in the system (default).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:23:55 +02:00
Andrea Righi
a66ce5ce56 scx_utils::cpumask: support special strings "all" and "none"
Allow to create a Cpumask from a string "all" (enable all the CPUs) or
"none" (disable all CPUs).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:15:01 +02:00
Andrea Righi
872e653cd2 scx_utils: introduce Turbo core type to Topology
Integrate the logic used by scx_bpfland to detect turbo-boosted cores in
Topology.

Also change the logic to detect Big/Little cores in function of
base_frequency, instead of scaling_max_freq, otherwise turbo-boosted
cores in homogeneous systems may be incorrectly classified as Big.

Moreover, introduce the following new methods to Cpu to check for the
core type:
 - is_turbo(): return true if the CPU is Turbo, false otherwise
 - is_big(): return true if the CPU is either Turbo or Big
 - is_little(): return true if the CPU is Little

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:09:08 +02:00
Piotr Gorski
19657d4e0c
scx_stats: Drop sched-ext namespace
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-08-27 18:06:37 +02:00
Daniel Hodges
250eacdcab
Merge pull request #570 from hodgesds/topo-updates
scx_utils: Add Big/Little core logic to Topology
2024-08-27 10:30:37 -04:00
Daniel Hodges
1e9fb9ba97 scx_utils: Add Big/Little core logic to Topology
This borrows some of the logic in scx_lavd for figuring out if a core is
a Big/Little core. If this makes sense we can add helper methods
directly on the topology to return Big/Little cores so that each
scheduler doesn't have to reinvent the same logic.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-27 07:01:18 -07:00
Daniel Hodges
41cebb807a
Merge pull request #569 from anh0516/main
scx_layered: Clean up in-code documentation; add commas for consistency
2024-08-27 09:47:29 -04:00
Andrea Righi
6768f9f88c
Merge pull request #572 from sched-ext/bpfland-fix-turbo-domain
scx_bpfland: fix turbo boost domain nullifying primary domain limits
2024-08-27 15:23:12 +02:00
Changwoo Min
e567ee8494
Merge pull request #571 from multics69/lavd-verifier
scx_lavd: make a loop easier to correctly verify
2024-08-27 21:50:09 +09:00
Andrea Righi
e0f49a338a scx_bpfland: fix turbo boost domain nullifying primary domain limits
When creating the turbo boost scheduling domain, we might use a full CPU
mask (selecting all possible CPUs) to indicate "do not prioritize turbo
boost CPUs" or when all CPUs have the same maximum frequency.

This approach works when the primary domain also contains all the CPUs,
as the complete overlap allows the CPU selection logic to ignore the
turbo boost domain and start picking CPUs directly from the primary
domain.

However, if the primary domain doesn't include all CPUs, the two domains
won't fully overlap, which can lead to the turbo boost domain
incorrectly including all CPUs, thereby negating the restrictions set by
the primary scheduling domain.

To resolve this, an empty CPU mask should be used for the turbo boost
domain when turbo boost CPUs aren't prioritized. If the turbo boost
domain is empty, it should be entirely bypassed, and the selection
should proceed directly to the primary domain.

Reported-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-27 13:36:50 +02:00
Changwoo Min
00430c3ded scx_lavd: make a loop easier to correctly verify
With an ill combination of old kernel and old LLVM, the BPF verifier
incorrectly detects an infinite loop. After changing the loop with a
constant end, the old verifier can pass the code.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-27 17:11:20 +09:00
Changwoo Min
09cff560aa
Merge pull request #566 from multics69/lavd-turbo
scx_lavd: prioritize the turbo boost-able cores
2024-08-27 08:47:25 +09:00