Commit Graph

1536 Commits

Author SHA1 Message Date
Andrea Righi
28cb1ec5cb scx_bpfland: enhanced task affinity
Aggressively try to keep tasks running on the same CPU / cache / domain,
to achieve higher performance when the system is not over commissioned.

This is done by giving a second chance in ops.enqueue(), in addition to
ops.select_cpu(), to find an idle CPU close to the previously used CPU.

Moreover, even if the task is dispatched to the global DSQs, always try
to check if there is an idle CPU in the primary domain that can
immediately consume the task.

= Results =

This change seems to provide a minor, but consistent, boost of
performance with the CPU-intensive benchmarks from the CachyOS
benchmarks selection [1].

Similar results can also be noticed with some WebGL benchmarks [2], when
system usage is close to its maximum capacity.

Test:
 - cachyos-benchmarker

System:
 - AMD Ryzen 7 5800X 8-Core Processor

Metrics:
 - total time: elapsed time of all benchmarks
 - total score: geometric mean of all benchmarks

NOTE: total time is the most relevant, since it gives a measure of the
aggregate performance, while the total score emphasizes more on
performance consistency across all benchmarks.

== Results: summary ==

 +-------------------------+---------------------+---------------------+
 |         Scheduler       |    Total Time       |    Total Score      |
 |                         |    (less = better)  |    (less = better)  |
 +-------------------------+---------------------+---------------------+
 |                 EEVDF   |  624.44 sec         |      123.68         |
 |               bpfland   |  625.34 sec         |      122.21         |
 | bpfland-task-affinity   |  623.67 sec         |      122.27         |
 +-------------------------+---------------------+---------------------+

== Conclusion ==

With this patch applied, bpfland shows both a better performance and
consistency. Although the gains are small (less than 1%), they are still
significant for this type of benchmark and consistently appear across
multiple runs.

[1] https://github.com/CachyOS/cachyos-benchmarker
[2] https://webglsamples.org/aquarium/aquarium.html

Tested-by: Piotr Gorski < piotr.gorski@cachyos.org >
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 10:30:54 +02:00
Changwoo Min
d708939e5a
Merge pull request #576 from anh0516/main
scx_lavd: Add power mode clarification to --no-prefer-turbo-core
2024-08-28 14:03:46 +09:00
Avraham Hollander
6c5d85401d
Merge branch 'sched-ext:main' into main 2024-08-27 23:07:54 -04:00
Avraham Hollander
2a3cbeb760 scx_lavd: Add same power mode clarification to --no-prefer-turbo-core 2024-08-27 23:06:31 -04:00
Andrea Righi
1421094f32
Merge pull request #574 from sched-ext/scx-utils-turbo-core-type
scx_bpfland: rely on Topology to classify CPU types
2024-08-28 00:52:44 +02:00
Andrea Righi
a155d5185d scx_bpfland: rely on Topology to classify core types
Rely on scx_utils::Topology to classify Big, Little and Turbo CPUs.

Moreover, support the special keyword "all" with --primary-domain to
include all the CPUs in the system (default).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:23:55 +02:00
Andrea Righi
a66ce5ce56 scx_utils::cpumask: support special strings "all" and "none"
Allow to create a Cpumask from a string "all" (enable all the CPUs) or
"none" (disable all CPUs).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:15:01 +02:00
Andrea Righi
872e653cd2 scx_utils: introduce Turbo core type to Topology
Integrate the logic used by scx_bpfland to detect turbo-boosted cores in
Topology.

Also change the logic to detect Big/Little cores in function of
base_frequency, instead of scaling_max_freq, otherwise turbo-boosted
cores in homogeneous systems may be incorrectly classified as Big.

Moreover, introduce the following new methods to Cpu to check for the
core type:
 - is_turbo(): return true if the CPU is Turbo, false otherwise
 - is_big(): return true if the CPU is either Turbo or Big
 - is_little(): return true if the CPU is Little

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-28 00:09:08 +02:00
Daniel Hodges
250eacdcab
Merge pull request #570 from hodgesds/topo-updates
scx_utils: Add Big/Little core logic to Topology
2024-08-27 10:30:37 -04:00
Daniel Hodges
1e9fb9ba97 scx_utils: Add Big/Little core logic to Topology
This borrows some of the logic in scx_lavd for figuring out if a core is
a Big/Little core. If this makes sense we can add helper methods
directly on the topology to return Big/Little cores so that each
scheduler doesn't have to reinvent the same logic.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-27 07:01:18 -07:00
Daniel Hodges
41cebb807a
Merge pull request #569 from anh0516/main
scx_layered: Clean up in-code documentation; add commas for consistency
2024-08-27 09:47:29 -04:00
Andrea Righi
6768f9f88c
Merge pull request #572 from sched-ext/bpfland-fix-turbo-domain
scx_bpfland: fix turbo boost domain nullifying primary domain limits
2024-08-27 15:23:12 +02:00
Changwoo Min
e567ee8494
Merge pull request #571 from multics69/lavd-verifier
scx_lavd: make a loop easier to correctly verify
2024-08-27 21:50:09 +09:00
Andrea Righi
e0f49a338a scx_bpfland: fix turbo boost domain nullifying primary domain limits
When creating the turbo boost scheduling domain, we might use a full CPU
mask (selecting all possible CPUs) to indicate "do not prioritize turbo
boost CPUs" or when all CPUs have the same maximum frequency.

This approach works when the primary domain also contains all the CPUs,
as the complete overlap allows the CPU selection logic to ignore the
turbo boost domain and start picking CPUs directly from the primary
domain.

However, if the primary domain doesn't include all CPUs, the two domains
won't fully overlap, which can lead to the turbo boost domain
incorrectly including all CPUs, thereby negating the restrictions set by
the primary scheduling domain.

To resolve this, an empty CPU mask should be used for the turbo boost
domain when turbo boost CPUs aren't prioritized. If the turbo boost
domain is empty, it should be entirely bypassed, and the selection
should proceed directly to the primary domain.

Reported-by: Changwoo Min <changwoo@igalia.com>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-27 13:36:50 +02:00
Changwoo Min
00430c3ded scx_lavd: make a loop easier to correctly verify
With an ill combination of old kernel and old LLVM, the BPF verifier
incorrectly detects an infinite loop. After changing the loop with a
constant end, the old verifier can pass the code.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-27 17:11:20 +09:00
Changwoo Min
09cff560aa
Merge pull request #566 from multics69/lavd-turbo
scx_lavd: prioritize the turbo boost-able cores
2024-08-27 08:47:25 +09:00
Daniel Hodges
83cd26eb9e
Merge pull request #564 from hodgesds/layered-help
scx_layered: Update help for tgid matching
2024-08-26 14:52:53 -04:00
Andrea Righi
35db89e90d
Merge pull request #568 from sched-ext/rustland-core-design-improv
scx_rustland_core: small core design improvements
2024-08-26 20:06:21 +02:00
Tejun Heo
ce439d9193
Merge pull request #562 from sched-ext/htejun/misc
version-tool: Avoid exception by skipping workspace cargo files
2024-08-26 07:49:30 -10:00
Avraham Hollander
7a43801d76 Add quotes for clarity 2024-08-26 13:20:01 -04:00
Avraham Hollander
0b6ebf826e scx_lavd, scx_mitosis, scx_rusty: Add comma for grammatical consistency
with the same change in the other schedulers
2024-08-26 13:06:58 -04:00
Avraham Hollander
07039f1f07 scx_layered: Documentation cleanup 2024-08-26 13:03:52 -04:00
Andrea Righi
0bdbb255bb scx_rustland_core: update README.md with a FIFO example
Include the FIFO example directly in the README.md, instead of linking
scx_rlfifo.

Including the example directly in the README can be more useful and
practical in those cases where internet access is not available or when
we need to distribute a more "standalone" documentation.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 17:42:51 +02:00
Andrea Righi
1427d7d347 scx_rlfifo: enhance code design
Refactor the code design to make it more suitable as a template for
implementing advanced scheduling policies.

In particular, create separate loops for task consumption and task
dispatching. This will make the scheduler easier to adapt as a
foundation for implementing more complex scheduling policies.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 16:10:54 +02:00
Daniel Hodges
c45c2de39f scx_layered: Update help for tgid matching
Forgot to add doc for tgid matching

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-26 07:06:21 -07:00
Andrea Righi
820fc5a8b4 scx_rustland_core: allow to propagate vtime to BPF
Introduce a vtime attribute to struct DispatchedTask that can be set by
the user-space scheduler and it'll be use by the BPF component to
dispatch the task via scx_bpf_dispatch_vtime().

In this way a user-space scheduler can decide to apply its own internal
task ordering or rely on the BPF vtime priority DSQs (or both).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 15:56:05 +02:00
Andrea Righi
2ee07bb1fb scx_rustland_core: temporarily drop RL_PREEMPT_CPU
Temporarily drop the RL_PREEMPT_CPU flag, we need a better way to
implement preemption in scx_rustland_core and it's not very effective at
the moment, so simply it drop it for now (it'll be re-added later in the
future in a proper way).

This change does not affect any scheduler, since RL_PREEMPT_CPU is
currently unused.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-26 15:54:32 +02:00
Changwoo Min
9807e561f0 scx_lavd: prioritize the turbo boost-able cores
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 17:57:33 +09:00
Changwoo Min
cd5b2bf664 scx_lavd: replace nix signal handler to ctrlc
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 17:57:33 +09:00
Changwoo Min
e887c56da0 scx_lavd: add "--version" option, which prints the current version
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 17:57:33 +09:00
Changwoo Min
1c1cc1030a
Merge pull request #558 from multics69/lavd-revise
scx_lavd: move time slice calculation to ops.enqueue() and ops.select_cpu() and more
2024-08-26 17:55:39 +09:00
Changwoo Min
0f97ca3066 scx_lavd: drop time slice calculation in ops.select_cpu()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 17:55:00 +09:00
Tejun Heo
4a5de74983 version-tool: Avoid exception by skipping workspace cargo files 2024-08-25 22:11:09 -10:00
Changwoo Min
4e3c36ca3f scx_lavd: handle the missing cases in time slice calculation
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
be7d06e280 scx_lavd: make the old BPF verifier happy :-(
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
82f55b95b2 scx_lavd: add a fast path in pick_idle_cpu() when SMT is not activated
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
38779dbe8b scx_lavd: improve pick_idle_cpu()
Now it checks an active cpumask within a previous core's compute domain
before checking the full active CPUs.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
d1d9e97d08 scx_lavd: reduce LAVD_CPDOM_MAX_DIST to 4
The BPF verifier in the old kernel gives up to analysis the nested loop
in the consume_task(). We reduce the loop less complex by reducing
LAVD_CPDOM_MAX_DIST from 6 to 4 in order to make the verifier happy.
Note that the theoretical maximum distance is 6 (numa > llc > core type)
but there is no such hardware today, hence reducing it to 6 should be
okay in next few years, when hopefully the verifier becomes smarter.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
950710990f scx_lavd: move time slice calculation to ops.enqueue() and ops.select_cpu()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
954b684a70 scx_lavd: update nr_queued_task every system stat update interval
Updating nr_queue_task every runqueue operation is expensive and
unnecessary. So we do update every system state update interval and use
moving average, which is accurate enough.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
4f906f1f49 scx_lavd: update README since it supports multi-CCX/NUMA
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
9551657b42 scx_lavd: prefer big cores in the performance mode
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
d4bb35e651 scx_lavd: use itertools::iproduct!() for a nested loop
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Changwoo Min
9368c6881d scx_lavd: replace get_task_cpu_id() to scx_bpf_task_cpu()
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-08-26 11:43:29 +09:00
Daniel Hodges
bf67b83561
Merge pull request #549 from hodgesds/stats-path-fix
scx_utils: Add retryable errors
2024-08-25 15:22:46 -04:00
Daniel Hodges
c31f2b63cb
scx_utils: Add retryable errors
Add a set of retryable errors for scx_utils when connecting to the stats
server.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-25 14:54:36 -04:00
Andrea Righi
a469f0f1ce
Merge pull request #561 from sched-ext/bpfland-fix-energy-profile-refresh
scx_bpfland: prevent reading energy profile if not available
2024-08-25 18:31:34 +02:00
Tejun Heo
ca13e13ad6
Merge pull request #559 from sched-ext/htejun/cargo-workspace
build: Use workspace to group rust sub-projects
2024-08-25 06:26:18 -10:00
Andrea Righi
f8acd069f0 scx_bpfland: prevent reading energy profile if not available
Avoid to periodically read the current performance profile from
/sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference if
it's not available (i.e., with older CPUs or kernels without cpufreq).

This fixes issue #560.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 16:53:35 +02:00
Andrea Righi
8853d9a9f2
Merge pull request #548 from sched-ext/rustland-core-refactoring
scx_rustland_core: user-space framework refactoring
2024-08-25 16:39:28 +02:00