Commit Graph

759 Commits

Author SHA1 Message Date
Daniel Hodges
4d1c932619 scx_layered: Fix core selection
Fix a bug introduced in #510 where it assumed core ids are incremental.
This refactors the core ordering for layers to be far more simple and
provide some space for layer core isolation in low utilization.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-20 19:26:53 -07:00
Changwoo Min
41bc6f0967
Merge pull request #511 from multics69/lavd-perf-profile
scx_lavd: add power profile options: --performance, --balanced, --powersave
2024-08-20 09:02:37 +09:00
Changwoo Min
1d61dd4c1d
Merge pull request #508 from multics69/lavd-numa-fix
scx_lavd: fix a potential watchdog timeout error at multi-NUMA/CCX platforms
2024-08-20 09:02:23 +09:00
Changwoo Min
2c4c2a0ccf
Merge pull request #507 from multics69/lavd-pretty-rust
scx_lavd: revise FlatTopology prettier
2024-08-20 09:01:26 +09:00
Daniel Hodges
05a2721f8e
Merge pull request #510 from hodgesds/layered-core-topo-selection
scx_layered: Use topology for core selection
2024-08-19 20:01:16 -04:00
Tejun Heo
d01b49bd0e scx_layered: Fix verification failure
4fccc06905 ("scx_layered: Fix uninitialized variable") causes the
following verification failure. Fix it by moving assignments below range
checking.

  Validating match_layer() func#1...
  283: R1=scalar() R2=scalar() R3=mem_or_null(id=49,sz=1) R10=fp0
  ; int match_layer(u32 layer_id, pid_t pid, const char *cgrp_path) @ main.bpf.c:1029
  283: (7b) *(u64 *)(r10 -24) = r3      ; R3=mem_or_null(id=49,sz=1) R10=fp0 fp-24_w=mem_or_null(id=49,sz=1)
  284: (bc) w7 = w1                     ; R1=scalar() R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  ; struct layer *layer = &layers[layer_id]; @ main.bpf.c:1033
  285: (bc) w1 = w7                     ; R1_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  286: (27) r1 *= 1061192               ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  287: (18) r8 = 0xffffc90002a26000     ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080)
  289: (0f) r8 += r1                    ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  ; u32 nr_match_ors = layer->nr_match_ors; @ main.bpf.c:1034
  290: (bf) r1 = r8                     ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  291: (07) r1 += 1060992               ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,off=0x103080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  292: (61) r1 = *(u32 *)(r1 +0)
  R1 unbounded memory access, make sure to bounds check any such access
  processed 1099 insns (limit 1000000) max_states_per_insn 2 total_states 72 peak_states 72 mark_read 9
  -- END PROG LOAD LOG --
2024-08-19 13:18:20 -10:00
Daniel Hodges
b3793e0069 scx_layered: Use topology for core selection
Currently the core selection logic in scx_layered uses the first
available core in the bitmask. This is suboptimal when the scheduler is
configured with specific NUMA/LLC restrictions. The ideal core selection
logic should try to find the least used cores within the preferred
scheduling domain and allocate new cpus from shared cores within that
domain.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 15:51:35 -07:00
Tejun Heo
3498a2b899
Merge pull request #514 from sched-ext/htejun/scx_stats
scx_stats, scx_layered: Implement independent stats client sessions
2024-08-19 11:24:53 -10:00
Tejun Heo
f6bc52d31e scx_layered: Make --monitor behavior more useful
- If --monitor is specified with layer specs, the scheduler also starts
  stats monitoring on a thread.

- Standalone monitoring mode no longer exits when the scheduler isn't there.
2024-08-19 10:55:02 -10:00
Tejun Heo
d03e48eb75 scx_layered: Implement per-stats-client nr_layer_cpus_ranges tracking
With this, every client sees the correct nr_layer_cpus_ranges without
interfering with each other.
2024-08-19 09:12:51 -10:00
Tejun Heo
448aacfd60 scx_layered: Initialize Stats.prev_layer_cycles properly on new()
So that new stats session doesn't start with an inflated utilization number.
2024-08-19 08:40:40 -10:00
Tejun Heo
25d7e6f787 scx_layered: Implement on-demand statistics generation
Instead of keeping one copy of sched_stats, each stats server session
carries their own so that stats can be generated independently by each
client at any interval. CPU allocation min/max tracking is broken for now.
2024-08-19 08:27:36 -10:00
Tejun Heo
27c530e17e scx_stats: Add missing trait exports 2024-08-19 07:16:43 -10:00
Tejun Heo
0cf5ca605d scx_layered: Move processing_dur accounting into Stats and protect it with Arc<Mutex<>> 2024-08-19 06:25:23 -10:00
Tejun Heo
a77fe372d6 scx_stats: Make server shutdown when connection is dropped and add communication channel
This will make implementing connection sessions easier where each stats
client connection maintains a set of states.
2024-08-19 06:23:16 -10:00
Changwoo Min
832f194845 scx_lavd: add power profile options: --performance, --powersave, --balanced
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 19:03:51 +09:00
Changwoo Min
c4c157f91c scx_lavd: add "--prefer-little-core" option
This option chooses little (effiency) cores over big (performance) cores
to save power consumption for core compaction.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 18:23:35 +09:00
Changwoo Min
73b873827d scx_lavd: merge put_cpdom_rq() to ops.enqueue()
Clean and reorganized the code around ops.enqueue()

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 14:22:03 +09:00
Changwoo Min
9475ace336 scx_lavd: always enqueue to a DSQ in task's compute domain
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 14:07:56 +09:00
Changwoo Min
0656c3232e scx_lavd: revise FlatTopology prettier
The changes include 1) chopping down a big function into smaller ones
for readability and maintainability and 2) using the interior mutability
pattern (Cell and RefCell) to avoid unnecessary clone() calls.  There
are no functional changes.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 11:03:52 +09:00
I Hsin Cheng
4fccc06905 scx_layered: Fix uninitialized variable
Fix the uninitialized variable "layer" in the function match_layer which
caused the compiling process to fail. "layer" is supposed to be the same
as "&layers[layer_id]".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-08-17 23:32:53 +08:00
Tejun Heo
3a688cfde7 scx_stats: Add support for no-value user attributes and a bunch of other changes
- Allow no-value user attributes which are automatically assigned "true"
  when specified.

- Make "top" attribute string "true" instead of bool true for consistency.
  Testing for existence is always enough for value-less attributes.

- Don't drop leading "_" from user attribute names when storing in dicts.
  Dropping makes things more confusing.

- Add "_om_skip" to scx_layered fields which don't jive well with OM.
  scxstats_to_openmetrics.py is updated accordignly and no longer generates
  warnings on those fields.

- Examples and README updated accordingly.
2024-08-16 07:52:02 -10:00
I Hsin Cheng
5d85937842 scx_rusty: Fix typo
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-08-16 22:03:59 +08:00
Tejun Heo
c16b48d7b2 scheds/rust: Include Cargo.lock in the repo
Binary packages are expected to include Cargo.lock in the repo so that the
produced binaries match across different builds.
2024-08-15 23:08:35 -10:00
Tejun Heo
22167aeb14
Merge pull request #502 from sched-ext/htejun/scx_stats
scx_stats: Refine scx_stats and implement scxstats_to_openmetrics.py
2024-08-15 22:55:11 -10:00
Tejun Heo
570ca56c57 scx_layered: s/_om_field_prefix/_om_prefix/ 2024-08-15 21:29:58 -10:00
Tejun Heo
af01dd19ec
Merge pull request #500 from sched-ext/htejun/scx_stats
scx_stats, scx_layered: Add `om_prefix` attribute and fix s/stat/stats/ stragglers
2024-08-15 21:27:38 -10:00
Tejun Heo
ea453e51d3 scx_stats: Rename "all" attribute to "top" and clean up examples a bit 2024-08-15 21:24:55 -10:00
Tejun Heo
a910fa451a scx_layered: Add _om attributes to LayerStats for OpenMetrics piping 2024-08-15 19:11:49 -10:00
Tejun Heo
6a5d6f7c27 scx_stats: Replace field_prefix attribute with '_' prefixed user attributes 2024-08-15 19:09:59 -10:00
Tejun Heo
a9922deaa2 scx_stats: Add "all" attribute and rename metadata type strings 2024-08-15 14:50:00 -10:00
Tejun Heo
ebc1a89c34 scx_stats: s/stat/stats/ stragglers 2024-08-15 14:00:00 -10:00
Tejun Heo
bafd67b568 scx_stats: Fix parsing for multiple stat attributes
The code was assuming single attribute per #[stat()] block. Update it so
that there can be multiple comma separated attributes in a single block.
2024-08-15 13:46:20 -10:00
Tejun Heo
8f361af077 scx_layered: Shorten stat field descriptions 2024-08-15 13:25:48 -10:00
Tejun Heo
1912e05f0b
Merge pull request #499 from sched-ext/htejun/scx_stats
scx_stats: Misc changes to sync dep versions and publish on crates.io
2024-08-15 12:32:44 -10:00
Tejun Heo
0b9c8b5cbd scx_stats: Update versions to 0.2.0 to republish 2024-08-15 12:29:27 -10:00
Daniel Hodges
0319afc88e scx_layered: Update nr_cpus when resizing layers
After updating scx_layered to be topology aware the nr_cpus field on the
layer was not being updated properly. Update layer growing/shrinking
logic to correctly update the nr_cpus count.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-15 13:22:26 -07:00
Tejun Heo
cc73b6a826
Merge pull request #496 from sched-ext/htejun/scx_stat
scx_stat: Initial commit
2024-08-15 09:24:55 -10:00
Tejun Heo
b614cf848f scx_layered: Make monitor time based iterations dumber
This makes ctrl-c a bit more responsive without complicating code.
2024-08-15 09:23:29 -10:00
Tejun Heo
45fb724ee2 scx_layered: Restore cpumask reporting 2024-08-15 09:12:29 -10:00
Tejun Heo
751a38e34e scx_layered: Refactor stats printing code 2024-08-15 08:53:19 -10:00
Tejun Heo
a4f424056e scx_layered: Move stats server launching to stats.rs 2024-08-15 06:30:42 -10:00
Tejun Heo
17afc72479 scx_stats: Rename cleanups
- s/stat/stats/ on several stragglers.

- Rename traits so that they are more distinctive from struct and other
  names and follow the convention.
2024-08-15 06:24:56 -10:00
Tejun Heo
a091d5ea7d scx_layered: s/monitor.rs/stats.rs/ and make stats refresh code struct ops 2024-08-15 06:13:05 -10:00
Tejun Heo
8aae9a5de2 scx_stats: s/scx_stat/scx_stats/
Use plural form which is more widespread and also used in scheduler
implementations. No functional changes.
2024-08-15 05:31:34 -10:00
Tejun Heo
6e466d18df scx_layered: Initial switch to scx_stat
- This makes the scheduler side simpler and allows on-demand monitoring.

- OpenMetrics support is dropped for now. Will add a generic tool for it.

- This is a naive conversion. Will be further refined.

scx_layered no longer prints statistics by default. To watch statistics, run
`scx_layered --monitor` while the scheduler is running.
2024-08-14 13:48:41 -10:00
Tejun Heo
7820ec9b46 scx_stat, scx_layered: cargo fmt 2024-08-14 11:47:37 -10:00
Tejun Heo
099b6c266a scx_lavd: Build fix
Add "signal" feature to nix dependency; otherwise, build fails.
2024-08-14 07:55:04 -10:00
Andrea Righi
0f018c5fff
Merge pull request #484 from vax-r/rustland_unused
scx: Remove unused variables, imports and functions
2024-08-14 19:03:26 +02:00
Andrea Righi
f9a994412d scx_bpfland: introduce primary scheduling domain
Allow to specify a primary scheduling domain via the new command line
option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of
arbitrary length, representing the CPUs assigned to the domain.

If this option is not specified the scheduler will use all the available
CPUs in the system as primary domain (no behavior change).

Otherwise, if a primary scheduling domain is defined, the scheduler will
try to dispatch tasks only to the CPUs assigned to the primary domain,
until these CPUs are saturated, at which point tasks may overflow to
other available CPUs.

This feature can be used to prioritize certain cores over others and it
can be really effective in systems with heterogeneous cores (e.g.,
hybrid systems with P-cores and E-cores).

== Example (hybrid architecture) ==

Hardware:
 - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H
   - 6 P-cores 0..5  with 2 CPUs each (CPU from  0..11)
   - 8 E-cores 6..13 with 1 CPU  each (CPU from 12..19)

== Test ==

WebGL application (https://webglsamples.org/aquarium/aquarium.html):
this allows to generate a steady workload in the system without
over-saturating the CPUs.

Use different scheduler configurations:

 - EEVDF (default)
 - scx_bpfland using P-cores only (--primary-domain 0x00fff)
 - scx_bpfland using E-cores only (--primary-domain 0xff000)

Measure performance (fps) and power consumption (W).

== Result ==

                  +-----+-----+------+-----+----------+
                  | min | max | avg  |       |        |
                  | fps | fps | fps  | stdev | power  |
+-----------------+-----+-----+------+-------+--------+
| EEVDF           | 28  | 34  | 31.0 |  1.73 |  3.5W  |
| bpfland-p-cores | 33  | 34  | 33.5 |  0.29 |  3.5W  |
| bpfland-e-cores | 25  | 26  | 25.5 |  0.29 |  2.2W  |
+-----------------+-----+-----+------+-------+--------+

Using a primary scheduling domain of only P-cores with scx_bpfland
allows to achieve a more stable and predictable level of performance,
with an average of 33.5 fps and an error of ±0.5 fps.

In contrast, using EEVDF results in an average frame rate of 31.0 fps
with an error of ±3.0 fps, indicating slightly less consistency, due to
the fact that tasks are evenly distributed across all the cores in the
system (both slow and fast cores).

On the other hand, using a scheduling domain solely of E-cores with
scx_bpfland results in a lower average frame rate (25.5 fps), though it
maintains a stable performance (error of ±0.5 fps), but the power
consumption is also reduced, averaging 2.2W, compared to 3.5W with
either of the other configurations.

== Conclusion ==

In summary, with this change users have the flexibility to prioritize
scheduling on performance cores for better performance and consistency,
or prioritize energy efficient cores for reduced power consumption, on
hybrid architectures.

Moreover, this feature can also be used to minimize the number of cores
used by the scheduler, until they reach full capacity. This capability
can be useful for reducing power consumption even in homogeneous systems
or for conducting scheduling experiments with smaller sets of cores,
provided the system is not overcommitted.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-14 16:17:54 +02:00