Commit Graph

599 Commits

Author SHA1 Message Date
Tejun Heo
cc73b6a826
Merge pull request #496 from sched-ext/htejun/scx_stat
scx_stat: Initial commit
2024-08-15 09:24:55 -10:00
Tejun Heo
b614cf848f scx_layered: Make monitor time based iterations dumber
This makes ctrl-c a bit more responsive without complicating code.
2024-08-15 09:23:29 -10:00
Tejun Heo
45fb724ee2 scx_layered: Restore cpumask reporting 2024-08-15 09:12:29 -10:00
Tejun Heo
751a38e34e scx_layered: Refactor stats printing code 2024-08-15 08:53:19 -10:00
Tejun Heo
a4f424056e scx_layered: Move stats server launching to stats.rs 2024-08-15 06:30:42 -10:00
Tejun Heo
17afc72479 scx_stats: Rename cleanups
- s/stat/stats/ on several stragglers.

- Rename traits so that they are more distinctive from struct and other
  names and follow the convention.
2024-08-15 06:24:56 -10:00
Tejun Heo
a091d5ea7d scx_layered: s/monitor.rs/stats.rs/ and make stats refresh code struct ops 2024-08-15 06:13:05 -10:00
Tejun Heo
8aae9a5de2 scx_stats: s/scx_stat/scx_stats/
Use plural form which is more widespread and also used in scheduler
implementations. No functional changes.
2024-08-15 05:31:34 -10:00
Tejun Heo
6e466d18df scx_layered: Initial switch to scx_stat
- This makes the scheduler side simpler and allows on-demand monitoring.

- OpenMetrics support is dropped for now. Will add a generic tool for it.

- This is a naive conversion. Will be further refined.

scx_layered no longer prints statistics by default. To watch statistics, run
`scx_layered --monitor` while the scheduler is running.
2024-08-14 13:48:41 -10:00
Tejun Heo
7820ec9b46 scx_stat, scx_layered: cargo fmt 2024-08-14 11:47:37 -10:00
Tejun Heo
099b6c266a scx_lavd: Build fix
Add "signal" feature to nix dependency; otherwise, build fails.
2024-08-14 07:55:04 -10:00
Andrea Righi
0f018c5fff
Merge pull request #484 from vax-r/rustland_unused
scx: Remove unused variables, imports and functions
2024-08-14 19:03:26 +02:00
Andrea Righi
f9a994412d scx_bpfland: introduce primary scheduling domain
Allow to specify a primary scheduling domain via the new command line
option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of
arbitrary length, representing the CPUs assigned to the domain.

If this option is not specified the scheduler will use all the available
CPUs in the system as primary domain (no behavior change).

Otherwise, if a primary scheduling domain is defined, the scheduler will
try to dispatch tasks only to the CPUs assigned to the primary domain,
until these CPUs are saturated, at which point tasks may overflow to
other available CPUs.

This feature can be used to prioritize certain cores over others and it
can be really effective in systems with heterogeneous cores (e.g.,
hybrid systems with P-cores and E-cores).

== Example (hybrid architecture) ==

Hardware:
 - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H
   - 6 P-cores 0..5  with 2 CPUs each (CPU from  0..11)
   - 8 E-cores 6..13 with 1 CPU  each (CPU from 12..19)

== Test ==

WebGL application (https://webglsamples.org/aquarium/aquarium.html):
this allows to generate a steady workload in the system without
over-saturating the CPUs.

Use different scheduler configurations:

 - EEVDF (default)
 - scx_bpfland using P-cores only (--primary-domain 0x00fff)
 - scx_bpfland using E-cores only (--primary-domain 0xff000)

Measure performance (fps) and power consumption (W).

== Result ==

                  +-----+-----+------+-----+----------+
                  | min | max | avg  |       |        |
                  | fps | fps | fps  | stdev | power  |
+-----------------+-----+-----+------+-------+--------+
| EEVDF           | 28  | 34  | 31.0 |  1.73 |  3.5W  |
| bpfland-p-cores | 33  | 34  | 33.5 |  0.29 |  3.5W  |
| bpfland-e-cores | 25  | 26  | 25.5 |  0.29 |  2.2W  |
+-----------------+-----+-----+------+-------+--------+

Using a primary scheduling domain of only P-cores with scx_bpfland
allows to achieve a more stable and predictable level of performance,
with an average of 33.5 fps and an error of ±0.5 fps.

In contrast, using EEVDF results in an average frame rate of 31.0 fps
with an error of ±3.0 fps, indicating slightly less consistency, due to
the fact that tasks are evenly distributed across all the cores in the
system (both slow and fast cores).

On the other hand, using a scheduling domain solely of E-cores with
scx_bpfland results in a lower average frame rate (25.5 fps), though it
maintains a stable performance (error of ±0.5 fps), but the power
consumption is also reduced, averaging 2.2W, compared to 3.5W with
either of the other configurations.

== Conclusion ==

In summary, with this change users have the flexibility to prioritize
scheduling on performance cores for better performance and consistency,
or prioritize energy efficient cores for reduced power consumption, on
hybrid architectures.

Moreover, this feature can also be used to minimize the number of cores
used by the scheduler, until they reach full capacity. This capability
can be useful for reducing power consumption even in homogeneous systems
or for conducting scheduling experiments with smaller sets of cores,
provided the system is not overcommitted.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-14 16:17:54 +02:00
Andrea Righi
a6e977c70b scx_bpfland: make output more compact
Abbreviate the statistics reported to stdout and remove the slice_ms
metric: this metric can be easily derived from slice_ns, slice_ns_min
and nr_wait, which is already reported to stdout.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-14 16:17:54 +02:00
Andrea Righi
8656effa50 scx_bpfland: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-14 16:17:54 +02:00
Changwoo Min
3c6d86b342 scx_lavd: upgrade nix package from 0.28.0 to 0.29.0
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-14 22:31:05 +09:00
Changwoo Min
444f0b86a5
Merge pull request #489 from multics69/lavd-amp-v4
lavd: make LAVD core-type (AMP) aware
2024-08-14 14:24:09 +09:00
Tejun Heo
4612764b82
Merge pull request #486 from vax-r/Fix_rusty_logic
scx_rusty: Fix logical error when filtering tasks
2024-08-13 09:39:12 -10:00
Daniel Hodges
646cefd46d
Merge pull request #477 from hodgesds/layered-global-match
scx_rusty: Make layer matching a global function
2024-08-12 09:14:58 -04:00
Daniel Hodges
be5213e129 scx_rusty: Make layer matching a global function
Layer matching currently takes a large number of bpf instructions.
Moving layer matching to a global function will reduce the overall
instruction count and allow for other layer matching methods such as
glob.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-12 05:44:34 -07:00
Changwoo Min
b7b8c8de90 scx_lavd: fix build errors
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 14:10:40 +09:00
Changwoo Min
182b0bd249 scx_lavd: make the verifier in 6.8 kernel happy
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:04:04 +09:00
Changwoo Min
4ecf3fc94e scx_lavd: build cpdom map from rust
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:03:18 +09:00
Changwoo Min
1f1a3dc4f1 scx_lavd: sort cores in descending order of max freq
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
c213a3e44f scx_lavd: make core compaction core type aware
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
c35b6b27ff scx_lavd: consider task pinning for core-type-aware ops.enqueue()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
25bf98d2a0 scx_lavd: make ops.select_cpu() core type aware
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
fa87e1c593 scx_lavd: make ops.dispatch() core type aware
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
c1cf11f7b1 scx_lavd: make ops.enqueue() core type aware
Put a performance-critical task to a performance critical queue and a
regular task to a regular queue.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
03a8c10ece scx_lavd: add cpdom_ctx to abstract compute domain and its DSQ
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
623b05a282 scx_lavd: revise perf_cri factor to reflect wakeup, runtime, and run_freq
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
15871fd032 scx_lavd: turn off pinned core less aggressively
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
9dc7f94cb6 scx_lavd: unifiy the deadline calculation and ineligibility calculation
The unified version is not only simpler but also works better.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:40 +09:00
Changwoo Min
4705520d40 scx_lavd: remove unnecessary options which has never been used
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-12 13:01:34 +09:00
I Hsin Cheng
15b40de408 scx_rusty: Fix logical error when filtering tasks
The logic of tasks filtering were moved from find_first_candidate() into
a vector filter operation in commit 1c3b563. However, it was forgotten
to transfer the logic with "NOT" since now .filter() will populate the
tasks we want, rather than .skip_while() which was throwing unwanted
tasks out.

That's why the logic here should be reverse so we won't take kworker or
migrated tasks into considerations.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-08-10 22:56:20 +08:00
I Hsin Cheng
4e40ba3b11 scx_rustland: Removed unused imports and variables
The member "topo_map" in Scheduler is never used and thus should be
removed, the related imports are removed as well.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-08-09 20:35:12 +08:00
I Hsin Cheng
b7e03b7a76 scx_bpfland: Remove unused variable
Remove unused variable "vtime" in task_vtime().

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-08-09 20:28:42 +08:00
Tejun Heo
45f7fd13b7 versions: Synchronize crate dependency versions 2024-08-08 14:45:46 -10:00
Tejun Heo
63c4a0191f
Merge branch 'main' into topic/inlined-skeleton-members 2024-08-08 14:23:37 -10:00
Tejun Heo
cd6a4d72c7 Bump versions for 1.0.2 release 2024-08-08 14:10:16 -10:00
Tejun Heo
7c3ffe96e1 Unify crate dependency versions
Different sub-projects are using different versions for the same crates.
Synchronize them to the latest.
2024-08-08 13:26:47 -10:00
Andrea Righi
9d808ae206
Merge pull request #468 from sched-ext/rustland-refactoring
scx_rustland refactoring
2024-08-07 11:38:21 +02:00
Andrea Righi
51cfb69199 scx_rustland_core: re-introduce partial mode
Re-add the partial mode option that was dropped during the refactoring.

The partial option allows to apply the scheduler only to the tasks which
have their scheduling policy set to SCHED_EXT via sched_setscheduler().

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:41:06 +02:00
Andrea Righi
e1f2b3822e scx_rustland_core: drop CPU ownership API
The API for determining which PID is running on a specific CPU is racy
and is unnecessary since this information can be obtained from user
space.

Additionally, it's not reliable for identifying idle CPUs.  Therefore,
it's better to remove this API and, in the future, provide a cpumask
alternative that can export the idle state of the CPUs to user space.

As a consequence also change scx_rustland to dispatch one task a time,
instead of dispatching tasks in batches of idle cores (that are usually
not accurate due to the racy nature of the CPU ownership interaface).

Dispatching one task at a time even makes the scheduler more performant,
due to the vruntime scheduling being applied to more tasks sitting in
the scheduler's queue.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:41:06 +02:00
Andrea Righi
9a0e7755df scx_rustland_core: export counter of online CPUs
Introduce a helper to get the amount of online CPUs tracked by the BPF
part.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
d9c9f78e3e scx_rustland: re-align vruntime and time slice evaluation to scx_bpfland
Drop the slice boost logic and apply a vruntime and task time slice
evaluation approach similar to scx_bpfland (but implement this in the
user-space component instead of the BPF part).

Additionally, introduce a slice_us_min parameter to define the minimum
time slice that can be assigned to a task, also similar to scx_bpfland.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
38a725ea34 scx_rlfifo: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
c963d5eb05 scx_rustland: update copyright info
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
b87541a26e scx_rustland_core: refactor idle CPU selection logic
Use the same idle selection logic used in scx_bpfland also in
scx_rustland_core.

Also drop fifo_mode and always use the BPF idle selection logic by
default as long as the system is not saturated, unless full_user is
specified.

This approach allows user-space schedulers aiming for maximum
performance to leverage the BPF idle selection logic (bypassing
user-space), while those seeking full control can enable full_user to
bypass the BPF CPU idle selection logic and choose the target CPU for
each task from user-space.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-07 08:10:53 +02:00
Andrea Righi
d8985306f4 scx_rustland: user-space interactive task classifier
We don't need to send the number of voluntary context switches (nvcsw)
from BPF to user-space, as this information is already accessible in
user-space via procfs. Sending this data would only create unnecessary
overhead for schedulers that don't require it, and those that do can
easily retrieve it through procfs.

Therefore, drop this metric from scx_rustland_core and change
scx_rustland implementing an interactive task classifier fully in the
user-space part of the scheduler.

Also drop some options that are not provide any significant benefit
(also in preparation of a bigger refactoring to define a better API for
the user-space framework).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-06 17:56:58 +02:00