scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-28 21:50:23 +00:00

Author	SHA1	Message	Date
Andrea Righi	f8a2445869	scx_bpfland: introduce performance/powersave primary domain The primary scheduling domain represents a group of CPUs in the system where the scheduler will initially attempt to assign tasks. Tasks will only be dispatched to CPUs within this primary domain until they are fully utilized, after which tasks may overflow to other available CPUs. The primary scheduling domain can defined using the option `--primary-domain CPUMASK` (by default all the CPUs in the system are used as primary domain). This change introduces two new special values for the CPUMASK argument: - `performance`: automatically detect the fastest CPUs in the system and use them as primary scheduling domain, - `powersave`: automatically detect the slowest CPUs in the system and use them as primary scheduling domain. The current logic only supports creating two groups: fast and slow CPUs. The fast CPU group is created by excluding CPUs with the lowest frequency from the overall set, which means that within the fast CPU group, CPUs may have different maximum frequencies. When using the `performance` mode the fast CPUs will be used as primary domain, whereas in `powersave` mode, the slow CPUs will be used instead. This option is particularly useful in hybrid architectures (with P-cores and E-cores), as it allows the use of bpfland to prioritize task scheduling on either P-cores or E-cores, depending on the desired performance profile. Example: - Dell Precision 5480 - CPU: 13th Gen Intel(R) Core(TM) i7-13800H - P-cores: 0-11 / max freq: 5.2GHz - E-cores: 12-19 / max freq: 4.0GHz $ scx_bpfland --primary-domain performance 0[\|\|\|\|\|\|\|\|\| 24.5%] 10[\|\|\|\|\|\|\|\| 22.8%] 1[\|\|\|\|\|\| 14.9%] 11[\|\|\|\|\|\|\|\|\|\|\|\|\| 36.9%] 2[\|\|\|\|\|\| 16.2%] 12[ 0.0%] 3[\|\|\|\|\|\|\|\|\| 25.3%] 13[ 0.0%] 4[\|\|\|\|\|\|\|\|\|\|\| 33.3%] 14[ 0.0%] 5[\|\|\|\| 9.9%] 15[ 0.0%] 6[\|\|\|\|\|\|\|\|\|\|\| 31.5%] 16[ 0.0%] 7[\|\|\|\|\|\|\| 17.4%] 17[ 0.0%] 8[\|\|\|\|\|\|\|\| 23.4%] 18[ 0.0%] 9[\|\|\|\|\|\|\|\|\| 26.1%] 19[ 0.0%] Avg power consumption: 3.29W $ scx_bpfland --primary-domain powersave 0[\| 2.5%] 10[ 0.0%] 1[ 0.0%] 11[ 0.0%] 2[ 0.0%] 12[\|\|\|\| 8.0%] 3[ 0.0%] 13[\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 64.2%] 4[ 0.0%] 14[\|\|\|\|\|\|\|\|\|\| 29.6%] 5[ 0.0%] 15[\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| 52.5%] 6[ 0.0%] 16[\|\|\|\|\|\|\|\|\| 24.7%] 7[ 0.0%] 17[\|\|\|\|\|\|\|\|\|\| 30.4%] 8[ 0.0%] 18[\|\|\|\|\|\|\| 22.4%] 9[ 0.0%] 19[\|\|\|\|\| 12.4%] Avg power consumption: 2.17W (Info collected from htop and turbostat) Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-19 20:19:21 +02:00
Andrea Righi	174993f9d2	scx_bpfland: introduce cache awareness While the system is not saturated the scheduler will use the following strategy to select the next CPU for a task: - pick the same CPU if it's a full-idle SMT core - pick any full-idle SMT core in the primary scheduling group that shares the same L2 cache - pick any full-idle SMT core in the primary scheduling grouop that shares the same L3 cache - pick the same CPU (ignoring SMT) - pick any idle CPU in the primary scheduling group that shares the same L2 cache - pick any idle CPU in the primary scheduling group that shares the same L3 cache - pick any idle CPU in the system While the system is completely saturated (no idle CPUs available), tasks will be dispatched on the first CPU that becomes available. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-19 20:19:21 +02:00
Tejun Heo	3a688cfde7	scx_stats: Add support for no-value user attributes and a bunch of other changes - Allow no-value user attributes which are automatically assigned "true" when specified. - Make "top" attribute string "true" instead of bool true for consistency. Testing for existence is always enough for value-less attributes. - Don't drop leading "_" from user attribute names when storing in dicts. Dropping makes things more confusing. - Add "_om_skip" to scx_layered fields which don't jive well with OM. scxstats_to_openmetrics.py is updated accordignly and no longer generates warnings on those fields. - Examples and README updated accordingly.	2024-08-16 07:52:02 -10:00
I Hsin Cheng	5d85937842	scx_rusty: Fix typo Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-08-16 22:03:59 +08:00
Tejun Heo	c16b48d7b2	scheds/rust: Include Cargo.lock in the repo Binary packages are expected to include Cargo.lock in the repo so that the produced binaries match across different builds.	2024-08-15 23:08:35 -10:00
Tejun Heo	22167aeb14	Merge pull request #502 from sched-ext/htejun/scx_stats scx_stats: Refine scx_stats and implement scxstats_to_openmetrics.py	2024-08-15 22:55:11 -10:00
Tejun Heo	570ca56c57	scx_layered: s/_om_field_prefix/_om_prefix/	2024-08-15 21:29:58 -10:00
Tejun Heo	af01dd19ec	Merge pull request #500 from sched-ext/htejun/scx_stats scx_stats, scx_layered: Add `om_prefix` attribute and fix s/stat/stats/ stragglers	2024-08-15 21:27:38 -10:00
Tejun Heo	ea453e51d3	scx_stats: Rename "all" attribute to "top" and clean up examples a bit	2024-08-15 21:24:55 -10:00
Tejun Heo	a910fa451a	scx_layered: Add _om attributes to LayerStats for OpenMetrics piping	2024-08-15 19:11:49 -10:00
Tejun Heo	6a5d6f7c27	scx_stats: Replace field_prefix attribute with '_' prefixed user attributes	2024-08-15 19:09:59 -10:00
Tejun Heo	a9922deaa2	scx_stats: Add "all" attribute and rename metadata type strings	2024-08-15 14:50:00 -10:00
Tejun Heo	ebc1a89c34	scx_stats: s/stat/stats/ stragglers	2024-08-15 14:00:00 -10:00
Tejun Heo	bafd67b568	scx_stats: Fix parsing for multiple stat attributes The code was assuming single attribute per #[stat()] block. Update it so that there can be multiple comma separated attributes in a single block.	2024-08-15 13:46:20 -10:00
Tejun Heo	8f361af077	scx_layered: Shorten stat field descriptions	2024-08-15 13:25:48 -10:00
Tejun Heo	1912e05f0b	Merge pull request #499 from sched-ext/htejun/scx_stats scx_stats: Misc changes to sync dep versions and publish on crates.io	2024-08-15 12:32:44 -10:00
Tejun Heo	0b9c8b5cbd	scx_stats: Update versions to 0.2.0 to republish	2024-08-15 12:29:27 -10:00
Daniel Hodges	0319afc88e	scx_layered: Update nr_cpus when resizing layers After updating scx_layered to be topology aware the nr_cpus field on the layer was not being updated properly. Update layer growing/shrinking logic to correctly update the nr_cpus count. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-15 13:22:26 -07:00
Tejun Heo	cc73b6a826	Merge pull request #496 from sched-ext/htejun/scx_stat scx_stat: Initial commit	2024-08-15 09:24:55 -10:00
Tejun Heo	b614cf848f	scx_layered: Make monitor time based iterations dumber This makes ctrl-c a bit more responsive without complicating code.	2024-08-15 09:23:29 -10:00
Tejun Heo	45fb724ee2	scx_layered: Restore cpumask reporting	2024-08-15 09:12:29 -10:00
Tejun Heo	751a38e34e	scx_layered: Refactor stats printing code	2024-08-15 08:53:19 -10:00
Tejun Heo	a4f424056e	scx_layered: Move stats server launching to stats.rs	2024-08-15 06:30:42 -10:00
Tejun Heo	17afc72479	scx_stats: Rename cleanups - s/stat/stats/ on several stragglers. - Rename traits so that they are more distinctive from struct and other names and follow the convention.	2024-08-15 06:24:56 -10:00
Tejun Heo	a091d5ea7d	scx_layered: s/monitor.rs/stats.rs/ and make stats refresh code struct ops	2024-08-15 06:13:05 -10:00
Tejun Heo	8aae9a5de2	scx_stats: s/scx_stat/scx_stats/ Use plural form which is more widespread and also used in scheduler implementations. No functional changes.	2024-08-15 05:31:34 -10:00
Tejun Heo	6e466d18df	scx_layered: Initial switch to scx_stat - This makes the scheduler side simpler and allows on-demand monitoring. - OpenMetrics support is dropped for now. Will add a generic tool for it. - This is a naive conversion. Will be further refined. scx_layered no longer prints statistics by default. To watch statistics, run `scx_layered --monitor` while the scheduler is running.	2024-08-14 13:48:41 -10:00
Tejun Heo	7820ec9b46	scx_stat, scx_layered: cargo fmt	2024-08-14 11:47:37 -10:00
Tejun Heo	099b6c266a	scx_lavd: Build fix Add "signal" feature to nix dependency; otherwise, build fails.	2024-08-14 07:55:04 -10:00
Andrea Righi	0f018c5fff	Merge pull request #484 from vax-r/rustland_unused scx: Remove unused variables, imports and functions	2024-08-14 19:03:26 +02:00
Andrea Righi	f9a994412d	scx_bpfland: introduce primary scheduling domain Allow to specify a primary scheduling domain via the new command line option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of arbitrary length, representing the CPUs assigned to the domain. If this option is not specified the scheduler will use all the available CPUs in the system as primary domain (no behavior change). Otherwise, if a primary scheduling domain is defined, the scheduler will try to dispatch tasks only to the CPUs assigned to the primary domain, until these CPUs are saturated, at which point tasks may overflow to other available CPUs. This feature can be used to prioritize certain cores over others and it can be really effective in systems with heterogeneous cores (e.g., hybrid systems with P-cores and E-cores). == Example (hybrid architecture) == Hardware: - Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H - 6 P-cores 0..5 with 2 CPUs each (CPU from 0..11) - 8 E-cores 6..13 with 1 CPU each (CPU from 12..19) == Test == WebGL application (https://webglsamples.org/aquarium/aquarium.html): this allows to generate a steady workload in the system without over-saturating the CPUs. Use different scheduler configurations: - EEVDF (default) - scx_bpfland using P-cores only (--primary-domain 0x00fff) - scx_bpfland using E-cores only (--primary-domain 0xff000) Measure performance (fps) and power consumption (W). == Result == +-----+-----+------+-----+----------+ \| min \| max \| avg \| \| \| \| fps \| fps \| fps \| stdev \| power \| +-----------------+-----+-----+------+-------+--------+ \| EEVDF \| 28 \| 34 \| 31.0 \| 1.73 \| 3.5W \| \| bpfland-p-cores \| 33 \| 34 \| 33.5 \| 0.29 \| 3.5W \| \| bpfland-e-cores \| 25 \| 26 \| 25.5 \| 0.29 \| 2.2W \| +-----------------+-----+-----+------+-------+--------+ Using a primary scheduling domain of only P-cores with scx_bpfland allows to achieve a more stable and predictable level of performance, with an average of 33.5 fps and an error of ±0.5 fps. In contrast, using EEVDF results in an average frame rate of 31.0 fps with an error of ±3.0 fps, indicating slightly less consistency, due to the fact that tasks are evenly distributed across all the cores in the system (both slow and fast cores). On the other hand, using a scheduling domain solely of E-cores with scx_bpfland results in a lower average frame rate (25.5 fps), though it maintains a stable performance (error of ±0.5 fps), but the power consumption is also reduced, averaging 2.2W, compared to 3.5W with either of the other configurations. == Conclusion == In summary, with this change users have the flexibility to prioritize scheduling on performance cores for better performance and consistency, or prioritize energy efficient cores for reduced power consumption, on hybrid architectures. Moreover, this feature can also be used to minimize the number of cores used by the scheduler, until they reach full capacity. This capability can be useful for reducing power consumption even in homogeneous systems or for conducting scheduling experiments with smaller sets of cores, provided the system is not overcommitted. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-14 16:17:54 +02:00
Andrea Righi	a6e977c70b	scx_bpfland: make output more compact Abbreviate the statistics reported to stdout and remove the slice_ms metric: this metric can be easily derived from slice_ns, slice_ns_min and nr_wait, which is already reported to stdout. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-14 16:17:54 +02:00
Andrea Righi	8656effa50	scx_bpfland: update copyright info Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-14 16:17:54 +02:00
Changwoo Min	3c6d86b342	scx_lavd: upgrade nix package from 0.28.0 to 0.29.0 Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-14 22:31:05 +09:00
Changwoo Min	444f0b86a5	Merge pull request #489 from multics69/lavd-amp-v4 lavd: make LAVD core-type (AMP) aware	2024-08-14 14:24:09 +09:00
Tejun Heo	4612764b82	Merge pull request #486 from vax-r/Fix_rusty_logic scx_rusty: Fix logical error when filtering tasks	2024-08-13 09:39:12 -10:00
Daniel Hodges	646cefd46d	Merge pull request #477 from hodgesds/layered-global-match scx_rusty: Make layer matching a global function	2024-08-12 09:14:58 -04:00
Daniel Hodges	be5213e129	scx_rusty: Make layer matching a global function Layer matching currently takes a large number of bpf instructions. Moving layer matching to a global function will reduce the overall instruction count and allow for other layer matching methods such as glob. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-12 05:44:34 -07:00
Changwoo Min	b7b8c8de90	scx_lavd: fix build errors Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 14:10:40 +09:00
Changwoo Min	182b0bd249	scx_lavd: make the verifier in 6.8 kernel happy Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:04:04 +09:00
Changwoo Min	4ecf3fc94e	scx_lavd: build cpdom map from rust Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:03:18 +09:00
Changwoo Min	1f1a3dc4f1	scx_lavd: sort cores in descending order of max freq Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	c213a3e44f	scx_lavd: make core compaction core type aware Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	c35b6b27ff	scx_lavd: consider task pinning for core-type-aware ops.enqueue() Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	25bf98d2a0	scx_lavd: make ops.select_cpu() core type aware Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	fa87e1c593	scx_lavd: make ops.dispatch() core type aware Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	c1cf11f7b1	scx_lavd: make ops.enqueue() core type aware Put a performance-critical task to a performance critical queue and a regular task to a regular queue. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	03a8c10ece	scx_lavd: add cpdom_ctx to abstract compute domain and its DSQ Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	623b05a282	scx_lavd: revise perf_cri factor to reflect wakeup, runtime, and run_freq Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00
Changwoo Min	15871fd032	scx_lavd: turn off pinned core less aggressively Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-12 13:01:40 +09:00

1 2 3 4 5 ...

617 Commits