Commit Graph

1408 Commits

Author SHA1 Message Date
Andrea Righi
50684e4569 scx_bpfland: introduce Intel Turbo Boost awareness
Make `--primar-domain auto` aware of turbo boosted CPUs and prioritize
them over the primary scheduling domain when the energy model
`balance_power` is used (typically when running on battery power with
the "balanced" profile).

With this change the scheduling hierarchy becomes the following:

 1) CPUs in the turbo scheduling domain
 2) CPUs in the primary scheduling domain
 3) full-idle SMT CPUs
 4) CPUs in the same L2 cache
 5) CPUs in the same L3 cache
 6) CPUs in the task's allowed domain

And the idle selection logic is modified as following:

 - In the turbo scheduling domain:
   - pick same full-idle SMT CPU
   - pick any other full-idle SMT CPU sharing the same L2 cache
   - pick any other full-idle SMT CPU sharing the same L3 cache
   - pick any other full-idle SMT CPU
   - pick same idle CPU
   - pick any other idle CPU sharing the same L2 cache
   - pick any other idle CPU sharing the same L3 cache
   - pick any other idle SMT CPU
 - In the primary scheduling domain:
   - pick same full-idle SMT CPU
   - pick any other full-idle SMT CPU sharing the same L2 cache
   - pick any other full-idle SMT CPU sharing the same L3 cache
   - pick any other full-idle SMT CPU
   - pick same idle CPU
   - pick any other idle CPU sharing the same L2 cache
   - pick any other idle CPU sharing the same L3 cache
   - pick any other idle SMT CPU
 - In the entire task domain:
   - pick any other idle CPU

Keep in mind that the turbo domain will be evaluated only when the
scheduler is started with `--primary-domain auto` and only when the
`balance_power` energy profile is used.

The turbo domain is always made using the subset of CPUs in the system
with the highest max frequency. If such subset can't be determined (for
example if all the CPUs in the primary domain have all the same
frequency), the turbo domain will be ignored.

Prioritizing turbo boosted CPUs can help to improve performance by
forcing the governor to scale up their frequency, without increasing too
much power consumption, due to the fact that tasks will be preferably
confined into a reduced amount of cores.

This change seems to improve performance, without increasing much
power consuption, on Intel laptops while using the `balanced_power`
energy profile.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-23 19:49:08 +02:00
Andrea Righi
d958dd4482 scx_bpfland: introduce dynamic energy profile
Introduce the new option `--primary-domain auto`. With this option the
scheduler will dynamically adjusts the primary scheduling domain at
run-time, in function of the current energy profile reported in
/sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference.

When the `power` energy profile is selected, the primary scheduling
domain will prioritize E-cores. Alternatively, when the `performance`
profile is selected, it will prioritize P-cores. For all the other
energy profiles, all the CPUs in the system will be used.

Note that this option is only relevant on hybrid architectures with
P-cores and E-cores.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-23 19:49:01 +02:00
Andrea Righi
bb7248ce61 scx_utils::cpumask: introduce is_empty() and is_full()
Introduce new methods to CpuMask to check if no bits are set or if all
bits are set.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-23 19:48:53 +02:00
Andrea Righi
115bfc8184
Merge pull request #536 from sched-ext/bpfland-lowlatency-mode
scx_bpfland: introduce --lowlatency option
2024-08-23 19:48:08 +02:00
Tejun Heo
e635e7eac8
Merge pull request #537 from sirlucjan/new-default
scx-scheds: set scx_bpfland as default scheduler
2024-08-23 05:55:59 -10:00
Piotr Gorski
3f0fcc319c
scx-scheds: set scx_bpfland as default scheduler
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-08-23 15:47:24 +02:00
Andrea Righi
6a2285398d scx_bpfland: introduce --lowlatency option
Introduce the new `--lowlatency` option, which enables switching between
the default pure vruntime-based scheduling (more optimized for server
workloads) and a deadline-based scheduling (better suited for
low-latency workloads).

When the low-latency mode is activated, a task's deadline is calculated
as its vruntime, adjusted by a bonus proportional to the task's average
number of voluntary context switches (the more voluntary context
switches, the shorter the deadline).

This feature enhances the prioritization of interactive tasks even more,
proportionally to their average voluntary context switches, also within
the two main global queues (priority / shared) and it helps to maintain
interactive workloads always responsive, even in presence of heavy
non-interactive background work.

Low-latency mode allows to prevent audio cracking even in presence of a
large amount of short-lived tasks with pseudo-interactive behavior (i.e,
hackbench) and it enables achieving approximately a +33% average
frames-per-second (FPS) in the typical "gaming while building the
kernel" benchmark.

However, it can also amplify the de-prioritization of CPU-intensive
tasks, making this option more suitable for specific low-latency
scenarios. Therefore the low-latency mode is disabled by default and it
can only be enabled via the `--lowlatency` option.

Tested-by: Piotr Gorski (piotrgorski@cachyos.org)
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-22 13:26:19 +02:00
Andrea Righi
dca4af151e
Merge pull request #535 from sched-ext/bpfland-time-slice-control
scx_bpfland: better time slice control
2024-08-22 10:32:16 +02:00
Andrea Righi
b0a8e4a91e scx_bpfland: better time slice control
Explicitly replenish the task's time slice from ops.dispatch() if the
task still wants to run and no other task is selected. In this way the
sched_ext core won't automatically re-schedule the task on the same CPU,
implicitly assigning a time slice of SCX_SLICE_DFL.

Moreover, instead of determining the task time slice in ops.enqueue(),
refresh the time slice immediately before the task is started on its
assigned CPU in ops.running().

This allows to use a more precise time slice, adjusted based on the
actual amount of tasks that are currently waiting to be scheduled.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-22 09:23:37 +02:00
Tejun Heo
8c35a1976f
Merge pull request #534 from sched-ext/htejun/misc
scx_layered: Drop SCX_OPS_ENQ_LAST
2024-08-21 19:57:54 -10:00
Tejun Heo
d6ac5fbd9c scx_layered: Drop SCX_OPS_ENQ_LAST
The meaning of SCX_OPS_ENQ_LAST will change with future kernel updates and
enqueueing on local DSQ will no longer be sufficient to avoid stalls. No
reason to do it anyway. Just drop it.
2024-08-21 13:13:59 -10:00
Tejun Heo
52d97c041d
Merge pull request #533 from sched-ext/htejun/release
Version: Cargo.lock
2024-08-21 06:46:02 -10:00
Tejun Heo
f726f0b73b Version: Cargo.lock 2024-08-21 06:45:19 -10:00
Tejun Heo
98fd3ec3b0
Merge pull request #532 from sched-ext/htejun/release
Version: v1.0.3
2024-08-21 06:43:36 -10:00
Tejun Heo
4d1f0639d8 Version: v1.0.3 2024-08-21 06:42:11 -10:00
Tejun Heo
2e63c0be60
Merge pull request #531 from sched-ext/bpfland-fix-unused-var
scx_bpfland: drop unused variable
2024-08-21 06:40:16 -10:00
Tejun Heo
6a2faf2e17
Merge pull request #530 from sched-ext/htejun/misc
scx_utils::topology: Use lazy_static instead of LazyLock
2024-08-21 06:04:14 -10:00
Andrea Righi
fedfee0bd6 scx_bpfland: drop unused variable
With the global scx_utils::NR_CPU_IDS we don't need Topology anymore in
init_primary_domain(), so drop the variable to fix the following build
warning:

warning: unused variable: `topo`
   --> src/main.rs:385:9
    |
385 |         topo: &Topology,
    |         ^^^^ help: if this is intentional, prefix it with an underscore: `_topo`
    |
    = note: `#[warn(unused_variables)]` on by default

Fixes: 1da249f ("scx_utils::topology: Always use NR_CPU_IDS and NR_CPUS_POSSIBLE")
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 17:46:12 +02:00
Andrea Righi
9f7a11bba6
Merge pull request #528 from sched-ext/bpfland-turbo-boost
scx_bpfland: properly classify Intel Turbo Boost CPUs
2024-08-21 17:40:25 +02:00
Tejun Heo
081b999361 scx_utils::topology: Use lazy_static instead of LazyLock
LazyLock is stable but has become so only very recently and can trigger
build errors on not-too-old stable rustc's which are still in wide use.
Let's use lazy_static instead for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-21 05:34:39 -10:00
Daniel Hodges
f2a6661a85
Merge pull request #524 from hodgesds/layered-core-fixes
scx_layered: Fix core selection
2024-08-21 08:13:33 -04:00
Tejun Heo
9c62019c81
Merge pull request #527 from sched-ext/htejun/scx_utils
scx_utils::cpumask,topology: Misc updates
2024-08-20 22:25:25 -10:00
Andrea Righi
695e3b25b0 scx_bpfland: classify CPUs depending of their the base frequency
Use the base frequency, instead of maximum frequency, to classify fast
and slow CPUs. This ensures accurate distinction between Intel Turbo
Boost CPUs and genuinely faster CPUs when auto-detecting the primary
scheduling domain.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 10:16:41 +02:00
Andrea Righi
bbe388e3bc scx_utils: topology: add base_freq() method to Cpu
With Intel Turbo Boost enabled, some CPUs might show a higher maximum
frequency than others, even if they are not actually faster cores. This
can potentially confuse some auto-detection logic for distinguishing
between fast and slow cores in certain schedulers.

The base CPU frequency reported in
/sys/devices/system/cpu/cpuN/cpufreq/base_frequency represents a more
reliable indicator for identifying truly fast and slow cores.

To address this, provide a new base_freq() method in the struct Cpu,
which will return the base operational frequency of a CPU when Turbo
Boost is present. If Turbo Boost is not available, base_freq() will
return the maximum frequency, functioning the same as max_freq().

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 10:07:57 +02:00
Andrea Righi
e0fb99835d
Merge pull request #525 from sched-ext/bpfland-disable-interactive
scx_bpfland: allow to completely disable interactive classification
2024-08-21 10:02:43 +02:00
Tejun Heo
1da249f063 scx_utils::topology: Always use NR_CPU_IDS and NR_CPUS_POSSIBLE
Always use the LazyLock versions and drop the counterparts from Topology.
2024-08-20 21:57:56 -10:00
Tejun Heo
1ae4655b3c scx_utils::cpumask: Default to displaying in hex
There isn't much to gain by displaying cpumasks in binary. Drop separate
Display implementation just default to 'x' formatting.
2024-08-20 21:50:23 -10:00
Tejun Heo
092f5422d6
Merge pull request #518 from sched-ext/htejun/misc
scx_layered: Add `--run-example` and enable CI testing
2024-08-20 21:42:45 -10:00
Tejun Heo
3ca2f0b6f9 scx_utils/cpumask: Use nr_cpu_ids instead of num_possible_cpus
- Add static NR_CPU_IDS and NR_CPUS_POSSIBLE to topology.

- Fix comment for Topology::nr_cpu_ids(). Was missing a negation.

- cpumaks should be sized by nr_cpus_ids, not num_possible_cpus and the
  number can't change while the system is running. Drop cpumask.nr_cpus and
  use *NR_CPU_IDS everywhere.
2024-08-20 21:25:40 -10:00
Tejun Heo
0cc59a5243 scx_utils: cargo fmt 2024-08-20 21:25:40 -10:00
Tejun Heo
91213de713 Merge branch 'main' into htejun/rusty 2024-08-20 21:13:12 -10:00
Tejun Heo
2d449f3288
Merge pull request #523 from Kawanaao/openrc-logrotate
openrc: Add logrotate support for openrc systems
2024-08-20 21:10:51 -10:00
Tejun Heo
f7c193e528 scx_utils, scx_rusty: Minor updates to version handling
- Update scx_utils/build.rs so that 12 char SHA1 is generated instead of
  full one.

- Add --version to scx_rusty. Use custom one as we don't want to use the
  default cargo version one.
2024-08-20 21:03:05 -10:00
Tejun Heo
8f786be08f scx_rusty: cargo fmt 2024-08-20 21:03:05 -10:00
Tejun Heo
4440567949 scx_rusty: Update Cargo.lock 2024-08-20 21:03:05 -10:00
Andrea Righi
c85315d527 scx_bpfland: allow to completely disable interactive classification
Tasks enqueued with SCX_ENQ_WAKEUP are immediately classified as
interactive. However, if interactive tasks classification is disabled
(via `-c 0`), we should avoid promoting them as interactive.

This is particularly important because, with the nvcsw logic disabled,
tasks can remain classified as interactive indefinitely and they will
never be demoted to regular tasks.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 08:45:13 +02:00
Andrea Righi
014dc7b3c3
Merge pull request #522 from sched-ext/bpfland-cpumask
scx_bpfland: use scx_utils::Cpumask
2024-08-21 08:37:43 +02:00
Andrea Righi
a9f5aaa536 scx_bpfland: replace custom CpuMask with scx_utils::Cpumask
Rely on scx_utils::Cpumask instead of re-implementing a custom struct to
parse and manage CPU masks.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 07:21:52 +02:00
Andrea Righi
235f19fdf1 cpumask: implement hex string formatter
Allow to format a Cpumask as an hex string, implementing the proper
formatter LowerHex / UpperHex traits.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 07:21:22 +02:00
Daniel Hodges
4d1c932619 scx_layered: Fix core selection
Fix a bug introduced in #510 where it assumed core ids are incremental.
This refactors the core ordering for layers to be far more simple and
provide some space for layer core isolation in low utilization.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-20 19:26:53 -07:00
Kawanaao
f35717e970
Create scx.logrotate 2024-08-20 18:02:15 +03:00
Kawanaao
3485adb47f
Add support for openrc logrotate 2024-08-20 17:47:16 +03:00
Andrea Righi
33b6ada98e
Merge pull request #509 from sched-ext/bpfland-topology
scx_bpfland: topology awareness
2024-08-20 14:37:23 +02:00
Daniel Hodges
9f2d548b8f
Merge pull request #520 from hodgesds/merge-fixes
ci: Fix cache directory
2024-08-20 07:33:22 -04:00
Andrea Righi
467d4b5ea4 scx_bpfland: get topology information from scx_utils::Topology
Rely on scx_utils::Topology to get CPU and cache information, instead of
re-implementing custom methods.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-20 10:16:02 +02:00
Andrea Righi
0b2dc6b9fc scx_utils: Add L2 / L3 cache id to CPU
Add the L2 / L3 cache id to the Cpu struct, to quickly determine the
cache nodes associated to each CPU.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-20 10:16:02 +02:00
Tejun Heo
c0fcc9bdeb meson-scripts/test_sched: Enable scx_layered testing
scx_layered now can be run with a single command when `--run-example` is
specified. Update test_sched script to support per-sched arguments and
enable it for scx_layered.
2024-08-19 20:50:10 -10:00
Tejun Heo
c0418250f4 scx_layered: Add --run-example option
So that scx_layered can be run in CI environment in a single command.
2024-08-19 20:50:10 -10:00
Daniel Hodges
e121dd3dd5 ci: Fix cache directory
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 20:07:50 -07:00
Daniel Hodges
03944694a9
Merge pull request #519 from hodgesds/veristat-merge-fix
ci: fix merge veristat cache generation
2024-08-19 21:58:00 -04:00