Instead of keeping one copy of sched_stats, each stats server session
carries their own so that stats can be generated independently by each
client at any interval. CPU allocation min/max tracking is broken for now.
- Allow no-value user attributes which are automatically assigned "true"
when specified.
- Make "top" attribute string "true" instead of bool true for consistency.
Testing for existence is always enough for value-less attributes.
- Don't drop leading "_" from user attribute names when storing in dicts.
Dropping makes things more confusing.
- Add "_om_skip" to scx_layered fields which don't jive well with OM.
scxstats_to_openmetrics.py is updated accordignly and no longer generates
warnings on those fields.
- Examples and README updated accordingly.
After updating scx_layered to be topology aware the nr_cpus field on the
layer was not being updated properly. Update layer growing/shrinking
logic to correctly update the nr_cpus count.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
- This makes the scheduler side simpler and allows on-demand monitoring.
- OpenMetrics support is dropped for now. Will add a generic tool for it.
- This is a naive conversion. Will be further refined.
scx_layered no longer prints statistics by default. To watch statistics, run
`scx_layered --monitor` while the scheduler is running.
Allow to specify a primary scheduling domain via the new command line
option `--primary-domain CPUMASK`, where CPUMASK can be a hex number of
arbitrary length, representing the CPUs assigned to the domain.
If this option is not specified the scheduler will use all the available
CPUs in the system as primary domain (no behavior change).
Otherwise, if a primary scheduling domain is defined, the scheduler will
try to dispatch tasks only to the CPUs assigned to the primary domain,
until these CPUs are saturated, at which point tasks may overflow to
other available CPUs.
This feature can be used to prioritize certain cores over others and it
can be really effective in systems with heterogeneous cores (e.g.,
hybrid systems with P-cores and E-cores).
== Example (hybrid architecture) ==
Hardware:
- Dell Precision 5480 with 13th Gen Intel(R) Core(TM) i7-13800H
- 6 P-cores 0..5 with 2 CPUs each (CPU from 0..11)
- 8 E-cores 6..13 with 1 CPU each (CPU from 12..19)
== Test ==
WebGL application (https://webglsamples.org/aquarium/aquarium.html):
this allows to generate a steady workload in the system without
over-saturating the CPUs.
Use different scheduler configurations:
- EEVDF (default)
- scx_bpfland using P-cores only (--primary-domain 0x00fff)
- scx_bpfland using E-cores only (--primary-domain 0xff000)
Measure performance (fps) and power consumption (W).
== Result ==
+-----+-----+------+-----+----------+
| min | max | avg | | |
| fps | fps | fps | stdev | power |
+-----------------+-----+-----+------+-------+--------+
| EEVDF | 28 | 34 | 31.0 | 1.73 | 3.5W |
| bpfland-p-cores | 33 | 34 | 33.5 | 0.29 | 3.5W |
| bpfland-e-cores | 25 | 26 | 25.5 | 0.29 | 2.2W |
+-----------------+-----+-----+------+-------+--------+
Using a primary scheduling domain of only P-cores with scx_bpfland
allows to achieve a more stable and predictable level of performance,
with an average of 33.5 fps and an error of ±0.5 fps.
In contrast, using EEVDF results in an average frame rate of 31.0 fps
with an error of ±3.0 fps, indicating slightly less consistency, due to
the fact that tasks are evenly distributed across all the cores in the
system (both slow and fast cores).
On the other hand, using a scheduling domain solely of E-cores with
scx_bpfland results in a lower average frame rate (25.5 fps), though it
maintains a stable performance (error of ±0.5 fps), but the power
consumption is also reduced, averaging 2.2W, compared to 3.5W with
either of the other configurations.
== Conclusion ==
In summary, with this change users have the flexibility to prioritize
scheduling on performance cores for better performance and consistency,
or prioritize energy efficient cores for reduced power consumption, on
hybrid architectures.
Moreover, this feature can also be used to minimize the number of cores
used by the scheduler, until they reach full capacity. This capability
can be useful for reducing power consumption even in homogeneous systems
or for conducting scheduling experiments with smaller sets of cores,
provided the system is not overcommitted.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Abbreviate the statistics reported to stdout and remove the slice_ms
metric: this metric can be easily derived from slice_ns, slice_ns_min
and nr_wait, which is already reported to stdout.
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
Layer matching currently takes a large number of bpf instructions.
Moving layer matching to a global function will reduce the overall
instruction count and allow for other layer matching methods such as
glob.
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Put a performance-critical task to a performance critical queue and a
regular task to a regular queue.
Signed-off-by: Changwoo Min <changwoo@igalia.com>