Commit Graph

1364 Commits

Author SHA1 Message Date
Andrea Righi
33b6ada98e
Merge pull request #509 from sched-ext/bpfland-topology
scx_bpfland: topology awareness
2024-08-20 14:37:23 +02:00
Daniel Hodges
9f2d548b8f
Merge pull request #520 from hodgesds/merge-fixes
ci: Fix cache directory
2024-08-20 07:33:22 -04:00
Andrea Righi
467d4b5ea4 scx_bpfland: get topology information from scx_utils::Topology
Rely on scx_utils::Topology to get CPU and cache information, instead of
re-implementing custom methods.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-20 10:16:02 +02:00
Andrea Righi
0b2dc6b9fc scx_utils: Add L2 / L3 cache id to CPU
Add the L2 / L3 cache id to the Cpu struct, to quickly determine the
cache nodes associated to each CPU.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-20 10:16:02 +02:00
Daniel Hodges
e121dd3dd5 ci: Fix cache directory
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 20:07:50 -07:00
Daniel Hodges
03944694a9
Merge pull request #519 from hodgesds/veristat-merge-fix
ci: fix merge veristat cache generation
2024-08-19 21:58:00 -04:00
Daniel Hodges
40bb003555 ci: fix merge veristat cache generation
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 18:30:32 -07:00
Daniel Hodges
d7cc4f30f0
Merge pull request #515 from hodgesds/veristat-fix
ci: fix veristat for PRs
2024-08-19 20:05:13 -04:00
Changwoo Min
41bc6f0967
Merge pull request #511 from multics69/lavd-perf-profile
scx_lavd: add power profile options: --performance, --balanced, --powersave
2024-08-20 09:02:37 +09:00
Changwoo Min
1d61dd4c1d
Merge pull request #508 from multics69/lavd-numa-fix
scx_lavd: fix a potential watchdog timeout error at multi-NUMA/CCX platforms
2024-08-20 09:02:23 +09:00
Changwoo Min
2c4c2a0ccf
Merge pull request #507 from multics69/lavd-pretty-rust
scx_lavd: revise FlatTopology prettier
2024-08-20 09:01:26 +09:00
Daniel Hodges
05a2721f8e
Merge pull request #510 from hodgesds/layered-core-topo-selection
scx_layered: Use topology for core selection
2024-08-19 20:01:16 -04:00
Tejun Heo
695a33cdcc
Merge pull request #517 from sched-ext/htejun/fix
scx_layered: Fix verification failure
2024-08-19 13:44:38 -10:00
Daniel Hodges
1ff5e4fbed ci: fix veristat for PRs
Make sure veristat is available for CI for PRs.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 16:42:31 -07:00
Tejun Heo
d01b49bd0e scx_layered: Fix verification failure
4fccc06905 ("scx_layered: Fix uninitialized variable") causes the
following verification failure. Fix it by moving assignments below range
checking.

  Validating match_layer() func#1...
  283: R1=scalar() R2=scalar() R3=mem_or_null(id=49,sz=1) R10=fp0
  ; int match_layer(u32 layer_id, pid_t pid, const char *cgrp_path) @ main.bpf.c:1029
  283: (7b) *(u64 *)(r10 -24) = r3      ; R3=mem_or_null(id=49,sz=1) R10=fp0 fp-24_w=mem_or_null(id=49,sz=1)
  284: (bc) w7 = w1                     ; R1=scalar() R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  ; struct layer *layer = &layers[layer_id]; @ main.bpf.c:1033
  285: (bc) w1 = w7                     ; R1_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R7_w=scalar(id=50,smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  286: (27) r1 *= 1061192               ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  287: (18) r8 = 0xffffc90002a26000     ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080)
  289: (0f) r8 += r1                    ; R1_w=scalar(smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  ; u32 nr_match_ors = layer->nr_match_ors; @ main.bpf.c:1034
  290: (bf) r1 = r8                     ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  291: (07) r1 += 1060992               ; R1_w=map_value(map=bpf_bpf.bss,ks=4,vs=16979080,off=0x103080,smin=0,smax=umax=0x103147ffefceb8,smax32=0x7ffffff8,umax32=0xfffffff8,var_off=(0x0; 0x1ffffffffffff8))
  292: (61) r1 = *(u32 *)(r1 +0)
  R1 unbounded memory access, make sure to bounds check any such access
  processed 1099 insns (limit 1000000) max_states_per_insn 2 total_states 72 peak_states 72 mark_read 9
  -- END PROG LOAD LOG --
2024-08-19 13:18:20 -10:00
Tejun Heo
c0b4deb9ec
Merge pull request #516 from sched-ext/htejun/scx_stats
scx_stats/scripts/scxstats_to_openmetrics: Retry connection
2024-08-19 13:02:22 -10:00
Tejun Heo
4e859d067e scx_stats/scripts/scxstats_to_openmetrics: Retry connection
It now retries until told to exit. This is a bit easier to use and matches
`scx_layered --monitor`.
2024-08-19 12:52:57 -10:00
Daniel Hodges
b3793e0069 scx_layered: Use topology for core selection
Currently the core selection logic in scx_layered uses the first
available core in the bitmask. This is suboptimal when the scheduler is
configured with specific NUMA/LLC restrictions. The ideal core selection
logic should try to find the least used cores within the preferred
scheduling domain and allocate new cpus from shared cores within that
domain.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 15:51:35 -07:00
Tejun Heo
3498a2b899
Merge pull request #514 from sched-ext/htejun/scx_stats
scx_stats, scx_layered: Implement independent stats client sessions
2024-08-19 11:24:53 -10:00
Tejun Heo
4198807841
Merge pull request #506 from vax-r/uninit_var
scx_layered: Fix uninitialized variable
2024-08-19 11:13:23 -10:00
Tejun Heo
f6bc52d31e scx_layered: Make --monitor behavior more useful
- If --monitor is specified with layer specs, the scheduler also starts
  stats monitoring on a thread.

- Standalone monitoring mode no longer exits when the scheduler isn't there.
2024-08-19 10:55:02 -10:00
Tejun Heo
cb9a2f5c32
Merge pull request #512 from hodgesds/doc-improvements
docs: Update developer guide
2024-08-19 09:33:38 -10:00
Tejun Heo
ab6cf29a2d
Merge pull request #513 from hodgesds/ci-fixes
ci: Fix veristat pull request workflow
2024-08-19 09:33:09 -10:00
Tejun Heo
d03e48eb75 scx_layered: Implement per-stats-client nr_layer_cpus_ranges tracking
With this, every client sees the correct nr_layer_cpus_ranges without
interfering with each other.
2024-08-19 09:12:51 -10:00
Daniel Hodges
7c27f8067d ci: Fix veristat pull request workflow
See the [failure](https://github.com/sched-ext/scx/actions/runs/10389671253), which needs to have an action defined.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 12:08:59 -07:00
Tejun Heo
448aacfd60 scx_layered: Initialize Stats.prev_layer_cycles properly on new()
So that new stats session doesn't start with an inflated utilization number.
2024-08-19 08:40:40 -10:00
Tejun Heo
6cba8d786a scx_stats: server: open_ops must be kept throughout a client session
open_ops tracks which ops have been opened by the client session; however,
it was being created on each handle_request() making every request to open
each time. Fix it by moving it to the caller.
2024-08-19 08:38:13 -10:00
Daniel Hodges
0048f8dd38 docs: Update developer guide
Add some info on `perf` to the developer guide and link from the main
readme.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 11:34:15 -07:00
Tejun Heo
25d7e6f787 scx_layered: Implement on-demand statistics generation
Instead of keeping one copy of sched_stats, each stats server session
carries their own so that stats can be generated independently by each
client at any interval. CPU allocation min/max tracking is broken for now.
2024-08-19 08:27:36 -10:00
Andrea Righi
f8a2445869 scx_bpfland: introduce performance/powersave primary domain
The primary scheduling domain represents a group of CPUs in the system
where the scheduler will initially attempt to assign tasks. Tasks will
only be dispatched to CPUs within this primary domain until they are
fully utilized, after which tasks may overflow to other available CPUs.

The primary scheduling domain can defined using the option
`--primary-domain CPUMASK` (by default all the CPUs in the system are
used as primary domain).

This change introduces two new special values for the CPUMASK argument:
 - `performance`: automatically detect the fastest CPUs in the system
   and use them as primary scheduling domain,
 - `powersave`: automatically detect the slowest CPUs in the system and
   use them as primary scheduling domain.

The current logic only supports creating two groups: fast and slow CPUs.

The fast CPU group is created by excluding CPUs with the lowest
frequency from the overall set, which means that within the fast CPU
group, CPUs may have different maximum frequencies.

When using the `performance` mode the fast CPUs will be used as primary
domain, whereas in `powersave` mode, the slow CPUs will be used instead.

This option is particularly useful in hybrid architectures (with P-cores
and E-cores), as it allows the use of bpfland to prioritize task
scheduling on either P-cores or E-cores, depending on the desired
performance profile.

Example:

 - Dell Precision 5480
   - CPU: 13th Gen Intel(R) Core(TM) i7-13800H
     - P-cores:  0-11 / max freq: 5.2GHz
     - E-cores: 12-19 / max freq: 4.0GHz

 $ scx_bpfland --primary-domain performance

  0[|||||||||                24.5%]  10[||||||||                  22.8%]
  1[||||||                   14.9%]  11[|||||||||||||             36.9%]
  2[||||||                   16.2%]  12[                           0.0%]
  3[|||||||||                25.3%]  13[                           0.0%]
  4[|||||||||||              33.3%]  14[                           0.0%]
  5[||||                      9.9%]  15[                           0.0%]
  6[|||||||||||              31.5%]  16[                           0.0%]
  7[|||||||                  17.4%]  17[                           0.0%]
  8[||||||||                 23.4%]  18[                           0.0%]
  9[|||||||||                26.1%]  19[                           0.0%]

  Avg power consumption: 3.29W

 $ scx_bpfland --primary-domain powersave

  0[|                         2.5%]  10[                           0.0%]
  1[                          0.0%]  11[                           0.0%]
  2[                          0.0%]  12[||||                       8.0%]
  3[                          0.0%]  13[|||||||||||||||||||||     64.2%]
  4[                          0.0%]  14[||||||||||                29.6%]
  5[                          0.0%]  15[|||||||||||||||||         52.5%]
  6[                          0.0%]  16[|||||||||                 24.7%]
  7[                          0.0%]  17[||||||||||                30.4%]
  8[                          0.0%]  18[|||||||                   22.4%]
  9[                          0.0%]  19[|||||                     12.4%]

  Avg power consumption: 2.17W

(Info collected from htop and turbostat)

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-19 20:19:21 +02:00
Andrea Righi
174993f9d2 scx_bpfland: introduce cache awareness
While the system is not saturated the scheduler will use the following
strategy to select the next CPU for a task:
  - pick the same CPU if it's a full-idle SMT core
  - pick any full-idle SMT core in the primary scheduling group that
    shares the same L2 cache
  - pick any full-idle SMT core in the primary scheduling grouop that
    shares the same L3 cache
  - pick the same CPU (ignoring SMT)
  - pick any idle CPU in the primary scheduling group that shares the
    same L2 cache
  - pick any idle CPU in the primary scheduling group that shares the
    same L3 cache
  - pick any idle CPU in the system

While the system is completely saturated (no idle CPUs available), tasks
will be dispatched on the first CPU that becomes available.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-19 20:19:21 +02:00
Daniel Hodges
0fdb8405dd
Merge pull request #494 from hodgesds/veristat-merge
meson: Add github action to run veristat
2024-08-19 14:03:34 -04:00
Tejun Heo
17a460c179 scx_stats: ScxStatsOps fields must be public 2024-08-19 07:51:05 -10:00
Tejun Heo
27c530e17e scx_stats: Add missing trait exports 2024-08-19 07:16:43 -10:00
Tejun Heo
1e89184ba7 scx_stats: server: s/Tx/Req/ and s/Rx/Res/ for clarity 2024-08-19 07:11:26 -10:00
Tejun Heo
4d88c9aec7 scx_stats: Add channel arguments to open and close ops too 2024-08-19 06:56:14 -10:00
Tejun Heo
0cf5ca605d scx_layered: Move processing_dur accounting into Stats and protect it with Arc<Mutex<>> 2024-08-19 06:25:23 -10:00
Tejun Heo
a77fe372d6 scx_stats: Make server shutdown when connection is dropped and add communication channel
This will make implementing connection sessions easier where each stats
client connection maintains a set of states.
2024-08-19 06:23:16 -10:00
Changwoo Min
832f194845 scx_lavd: add power profile options: --performance, --powersave, --balanced
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 19:03:51 +09:00
Changwoo Min
c4c157f91c scx_lavd: add "--prefer-little-core" option
This option chooses little (effiency) cores over big (performance) cores
to save power consumption for core compaction.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 18:23:35 +09:00
Changwoo Min
73b873827d scx_lavd: merge put_cpdom_rq() to ops.enqueue()
Clean and reorganized the code around ops.enqueue()

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 14:22:03 +09:00
Changwoo Min
9475ace336 scx_lavd: always enqueue to a DSQ in task's compute domain
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 14:07:56 +09:00
Changwoo Min
0656c3232e scx_lavd: revise FlatTopology prettier
The changes include 1) chopping down a big function into smaller ones
for readability and maintainability and 2) using the interior mutability
pattern (Cell and RefCell) to avoid unnecessary clone() calls.  There
are no functional changes.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-19 11:03:52 +09:00
I Hsin Cheng
4fccc06905 scx_layered: Fix uninitialized variable
Fix the uninitialized variable "layer" in the function match_layer which
caused the compiling process to fail. "layer" is supposed to be the same
as "&layers[layer_id]".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-08-17 23:32:53 +08:00
Tejun Heo
689d380db1 scx_stats: Implement ScxStatsServer::add_stats_ops()
This allows stats reader to maintain persistent states per connection.
2024-08-16 13:24:17 -10:00
Tejun Heo
96050f8bdd
Merge pull request #505 from sched-ext/htejun/scx_stats
scx_stats: Add support for no-value user attributes and a bunch of ot…
2024-08-16 09:12:20 -10:00
Tejun Heo
da2e014d15 scx_stats: Add more documentation on user attributes in README 2024-08-16 09:11:23 -10:00
Tejun Heo
3a688cfde7 scx_stats: Add support for no-value user attributes and a bunch of other changes
- Allow no-value user attributes which are automatically assigned "true"
  when specified.

- Make "top" attribute string "true" instead of bool true for consistency.
  Testing for existence is always enough for value-less attributes.

- Don't drop leading "_" from user attribute names when storing in dicts.
  Dropping makes things more confusing.

- Add "_om_skip" to scx_layered fields which don't jive well with OM.
  scxstats_to_openmetrics.py is updated accordignly and no longer generates
  warnings on those fields.

- Examples and README updated accordingly.
2024-08-16 07:52:02 -10:00
Tejun Heo
ee77090a2b
Merge pull request #501 from CachyOS/install/fedora
INSTALL: Update Fedora installation docs
2024-08-16 07:14:01 -10:00
Tejun Heo
e2b8525fbe
Merge pull request #504 from vax-r/fix_typo
scx_rusty: Fix typo
2024-08-16 07:13:23 -10:00