Commit Graph

927 Commits

Author SHA1 Message Date
Tejun Heo
970c04b43a compat: Drop support for missing sched_ext_ops.exit_dump_len
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.exit_dump_len.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:37:34 -10:00
Tejun Heo
046bdfd5e0 compat: Drop support for missing sched_ext_ops.hotplug_seq
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.hotplug_seq.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:34:59 -10:00
Tejun Heo
dde2942125 compat: Drop __COMPAT_scx_bpf_cpuperf_*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_*(). The open helper
macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.
2024-06-16 06:16:53 -10:00
Tejun Heo
13e8388e1e compat: Drop __COMPAT_HAS_CPUMASKS
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros
now check the existence of scx_bpf_nr_cpu_ids() and abort if not.
2024-06-16 06:12:06 -10:00
Tejun Heo
66901e2b44 compat: Drop __COMPAT_scx_bpf_dump()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_dump(). The open helper macros
now check the existence of scx_bpf_dump_bstr() and abort if not.

While at it, reorder the min requirement checks so that newly added ones are
up top to make testing easier.
2024-06-16 06:02:47 -10:00
Tejun Heo
0d8adf2260 compat: Drop __COMPAT_scx_bpf_exit()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_exit(). The open helper macros
now check the existence of scx_bpf_exit_bstr() and abort if not.
2024-06-15 20:36:17 -10:00
Tejun Heo
5b5e5be906 compat: Drop __COMPAT_SCX_KICK_IDLE
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros
now check the existence of SCX_KICK_IDLE and abort if not.
2024-06-15 20:24:15 -10:00
Tejun Heo
b730f35e68 scx/common.h: Improve SCX_BUG() macro
There's no guarantee that errno is set or contains relevant information when
SCX_BUG() is invoked. This sometimes leads to "task failed successfully"
messages:

  # ./scx_simple
  ../scheds/c/scx_simple.c:72 [scx panic]: Success
  SCX_OPS_SWITCH_PARTIAL missing, kernel too old?

While not critical, it's not great. Let's update it so that errno is printed
in parentheses when non-zero and match the tag to the macro name so that
what's printed is the following:

  # ./scx_simple
  [SCX_BUG] ../scheds/c/scx_simple.c:72
  SCX_OPS_SWITCH_PARTIAL missing, kernel too old?
2024-06-15 20:17:32 -10:00
Tejun Heo
7c9aedaefe compat: Drop __COMPAT_scx_bpf_switch_all()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper
macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.
2024-06-15 20:03:37 -10:00
Tejun Heo
fb2c70de84 scx_utils: compat.rs: Helper macros shouldn't return the calling function
scx_ops_open!() and scx_ops_attach!() could return the calling function
after an error, which can be surprising. Forutnately, as all the current
callers are either unwrapping or returning on error, the surprising behavior
is currently not very noticeable.

Fix it by breaking out of the macro block on errors.
2024-06-15 18:27:39 -10:00
Tejun Heo
dd6255a601
Merge pull request #359 from sched-ext/htejun/cosmetic
common.bpf.h: Cosmetic changes
2024-06-15 06:42:00 -10:00
Andrea Righi
3c0e05f995
Merge pull request #360 from sched-ext/scx-rlfifo-usability
scx_rlfifo: improve code usability for experiments
2024-06-15 18:14:50 +02:00
Andrea Righi
cb20a6f136 scx_rlfifo: dispatch all tasks on the first CPU available
With commit 786ec0c0 ("scx_rlfifo: schedule all tasks in user-space")
all the scheduling decisions are now happening in user-space. This also
bypasses the built-in idle selection logic, delegating the CPU selection
for each task to the user-space scheduler.

The easiest way to distribute tasks across the available CPUs is to
simply allow to dispatch them on the first CPU available.

In this way the scheduler becomes usable in practical scenarios and at
the same time it also maintains its simplicity.

This allows to spread all tasks across all the available CPUs

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-15 16:13:53 +02:00
Andrea Righi
786ec0c04a scx_rlfifo: schedule all tasks in user-space
Disable all the BPF optimization shortcuts by default and force all
tasks to be processed by the user-space scheduler.

Given that the primary goal of this scheduler is to offer a
straightforward and intuitive example for experimental purposes, this
change simplifies the process for individuals looking to experiment,
allowing them to apply changes to user-space code and quickly observe
the effects, without dealing with any in-kernel optimizations.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-15 16:07:39 +02:00
Andrea Righi
59f47d6659 scx_rlfifo: improve code readability
No functional change, just add some comments to better describe the
parameters used when initializing the main BpfScheduler object.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-15 16:05:28 +02:00
Andrea Righi
fafbc90fa5
Merge pull request #345 from sched-ext/rustland-prevent-starvation
scx_rustland: prevent starvation
2024-06-15 10:06:28 +02:00
Tejun Heo
d3b34d1df7 scx_qmap: Rename central_timer to monitor_timer
The name was copied from scx_central.bpf.c and doesn't match what the timer
is used for in scx_qmap.bpf.c.
2024-06-14 16:07:20 -10:00
Tejun Heo
13abb6fd26 scx/common.bpf.h: Reorganize
Currently, the BPF declarations and generic helpers are in the same section.
Let's move the generic helpers down to its own section.
2024-06-14 15:36:00 -10:00
Tejun Heo
d7677e3e5c scx/common.bpf.h: Rename bpf_log2[l]() to u32/64_log2()
The bpf_ prefix is used for BPF API. Rename bpf_log2() to u32_log2() and
bpf_log2l() to u64_log2(). While at it, relocate them below compiler
directive helpers.
2024-06-14 15:22:39 -10:00
Tejun Heo
5a2412c211 scx/common.bpf.h: Minor comment updates 2024-06-14 15:22:29 -10:00
Andrea Righi
88f40bda5c
Merge pull request #358 from sirlucjan/update-readme2
README: Add information about restoring default values
2024-06-14 21:28:45 +02:00
Piotr Gorski
0ace7226f0
README: Add information about restoring default values
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-06-14 21:24:56 +02:00
Andrea Righi
6cc2595922 scx_rustland_core: allow user-space scheduler to preempt other tasks
The same change was previously applied with commit 8820af8d
("scx_rustland: enable user-space scheduler to preempt other tasks").

However, it was reverted in commit 732ba490 ("scx_rustland: avoid using
SCX_ENQ_PREEMPT") with the introduction of the global dynamic time
slice (inversely proportional to the number of running tasks), because
it already provided sufficient context switch opportunities, negating
any advantage of the preemption.

With the introduction of the virtual time slice in commit 6f4cd853
("scx_rustland: introduce virtual time slice"), re-introducing the
ability for the user-space scheduler to preempt other tasks now appears
beneficial again.

As confirmed by experimental results, this change helps prevent
potential audio cracking issues and enhances overall system
responsiveness, resulting in a 4-5% increase in fps performing the
usual benchmark of gaming while recompiling the kernel.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-14 20:09:19 +02:00
Andrea Righi
2d2c6083d7 scx_rustland_core: use SCX_DSQ_LOCAL only with kthreads
Tasks dispatched directly using SCX_DSQ_LOCAL may receive excessively
high priority compared to those dispatched by the user-space scheduler.

To avoid this priority disparity, dispatch tasks to the per-CPU DSQs
using direct dispatch and reserve SCX_DSQ_LOCAL for per-CPU kthreads
only.

Tested-by: Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-14 20:09:19 +02:00
Andrea Righi
8c6fe540eb scx_rustland: prevent excessive starvation when system is congested
Keep track of the maximum vruntime among all tasks and flush them if the
difference between the maximum and minimum vruntime exceeds slice_ns.

This helps to prevent excessive starvation, as every task is guaranteed
to be dispatched within the slice_ns time limit.

Tested-by: Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-14 20:09:19 +02:00
Andrea Righi
cec297e147
Merge pull request #357 from Pprighi/scx-service-doc-override
scx.service: document scx override variables
2024-06-14 19:36:17 +02:00
Pietro Righi
8da1b126f5 scx.service: document scx override variables
Add a small section to document how to use SCX_SCHEDULER_OVERRIDE and
SCX_FLAGS_OVERRIDE with the scx systemd service.

Also fix a small typo (namspace -> namespace).

Signed-off-by: Pietro Righi <pietro.righi.email@gmail.com>
2024-06-14 19:31:37 +02:00
Tejun Heo
fdc2e8532b
Merge pull request #356 from Pprighi/scx-service-override
scx.service: allow overriding scx variables
2024-06-14 07:10:40 -10:00
Pietro Righi
66dea6262b scx.service: allow overriding scx variables
Switching the scheduler requires changing SCX_SCHEDULER (and potentially
also SCX_FLAGS) in /etc/default/scx.

This patch allows overriding these settings using systemd environment
variables SCX_SCHEDULER_OVERRIDE and SCX_FLAGS_OVERRIDE, without
changing the default configuration.

Example:

 > grep SCX_SCHEDULER /etc/default/scx
 SCX_SCHEDULER=scx_rusty

 > sudo systemctl status scx
 ...
   Main PID: 8021 (scx_rusty)
 ...

 > sudo systemctl set-environment SCX_SCHEDULER_OVERRIDE=scx_rustland
 > sudo systemctl restart scx
 > sudo systemctl status scx
...
   Main PID: 4021 (scx_rustland)
...

This feature can be useful for quickly testing different schedulers and
settings, without altering the global system configuration.

Signed-off-by: Pietro Righi <pietro.righi.email@gmail.com>
2024-06-14 18:51:11 +02:00
Changwoo Min
3a53162ce7
Merge pull request #355 from multics69/lavd-core-compaction-doc
scx_lavd: add the design of core compaction
2024-06-14 11:55:18 +09:00
Changwoo Min
94a39f419f scx_lavd: add the design of core compaction
The core compaction seems to work great in various hardware. Now it is
time to document its design.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-14 11:53:52 +09:00
Changwoo Min
5068d75bf3
Merge pull request #351 from multics69/lavd-power-v2
scx_lavd: improve CPU frequency scaling
2024-06-14 09:29:10 +09:00
Dan Schatzberg
6d7af64943
Merge pull request #346 from sirlucjan/config-update2
scheds: Add scx_mitosis scheduler to /etc/default/scx
2024-06-13 17:48:58 -04:00
Tejun Heo
a3342810c7
Merge pull request #352 from dschatzberg/mitosis
common: Add css iter forward declares
2024-06-13 06:50:06 -10:00
Changwoo Min
1bd2c2206f
Merge pull request #349 from multics69/lavd-suspend-resume
scx_lavd: properly calculate task's runtime after suspend/resume
2024-06-13 07:57:46 +09:00
Dan Schatzberg
114e4b644b common: Add css iter forward declares
These are used in mitosis, but they belong in common code so other
schedulers can do css iteration.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-06-12 15:02:48 -07:00
Tejun Heo
08521d4fec
Merge pull request #350 from vimproved/llvm-version-suffix
Support LLVM_VERSION_SUFFIX in clang version parsing regex
2024-06-12 07:27:03 -10:00
Changwoo Min
747bf2a7d7 scx_lavd: add the design of CPU frequency scaling
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-13 01:42:19 +09:00
Violet Purcell
2341b67971
Support LLVM_VERSION_SUFFIX in clang version parsing regex
If LLVM is compiled with the LLVM_VERSION_SUFFIX cmake option, then the
version may have an additional suffix, for example "18.1.7+libcxx".
Gentoo for example uses this to fend off ABI issues between libstdc++
and libc++.

Signed-off-by: Violet Purcell <vimproved@inventati.org>
2024-06-12 11:58:27 -04:00
Changwoo Min
2e74b86b4a scx_lavd: logging cpu performance target
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-13 00:44:04 +09:00
Changwoo Min
e6348a11e9 scx_lavd: improve frequency scaling logic
The old logic for CPU frequency scaling is that the task's CPU
performance target (i.e., target CPU frequency) is checked every tick
interval and updated immediately. Indeed, it samples and updates a
performance target every tick interval. Ultimately, it fluctuates CPU
frequency every tick interval, resulting in less steady performance.

Now, we take a different strategy. The key idea is to increase the
frequency as soon as possible when a task starts running for quick
adoption to load spikes. However, if necessary, it decreases gradually
every tick interval to avoid frequency fluctuations.

In my testing, it shows more stable performance in many workloads
(games, compilation).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-12 23:40:40 +09:00
Changwoo Min
753f333c09 scx_lavd: refactoring do_update_sys_stat()
Originally, do_update_sys_stat() simply calculated the system-wide CPU
utilization. Over time, it has evolved to collect all kinds of
system-wide, periodic statistics for decision-making, so it has become
bulky. Now, it is time to refactor it for readability. This commit does
not contain functional changes other than refactoring.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-12 21:15:25 +09:00
Changwoo Min
9d129f0afa scx_lavd: rename LAVD_CPU_UTIL_INTERVAL_NS to LAVD_SYS_STAT_INTERVAL_NS
The periodic CPU utilization routine does a lot of other work now. So we
rename LAVD_CPU_UTIL_INTERVAL_NS to LAVD_SYS_STAT_INTERVAL_NS.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-12 20:06:17 +09:00
Changwoo Min
7046b47b9c scx_lavd: properly calculate task's runtime after suspend/resume
When a device is suspended and resumed, the suspended duration is added
up to a task's runtime if the task was running on the CPU. After the
resume, the task's runtime is incorrectly long and the scheduler starts
to recognize the system is under heavy load. To avoid such problem, the
suspended duration is measured and substracted from the task's runtime.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-12 15:58:41 +09:00
Dan Schatzberg
34075829a4
Merge pull request #348 from dschatzberg/mitosis
mitosis: Fix build
2024-06-11 18:41:58 -04:00
Dan Schatzberg
b95cfb0772 mitosis: Fix build
The target wasn't dependent on the previous sched so building all
schedulers ended up not building scx_mitosis which broke the install
script.
2024-06-11 14:33:32 -07:00
Piotr Gorski
bbd3132b8e
scheds: Add scx_mitosis scheduler to /etc/default/scx
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-06-11 23:05:17 +02:00
Dan Schatzberg
9528d4603e
Merge pull request #339 from dschatzberg/mitosis
scheds: Add scx_mitosis scheduler
2024-06-11 16:50:25 -04:00
Dan Schatzberg
3b6e2dee20 scheds: Add scx_mitosis scheduler
scx_mitosis is a dynamic affinity scheduler which assigns cgroups to
Cells and Cells to discrete sets of CPUs. The number of cells is dynamic
as is the CPU assignment. BPF mostly just does vtime scheduling for each
cell, tracks load, and responds to reconfiguration from userspace.
Userspace makes decisions about how to assign cgroups to cells and cells
to cpus.

This is not yet a complete scheduler, much of the userspace logic is a
placeholder as I experiment with better logic. I also want to add richer
scheduling semantics to userspace, e.g. so that cells can do more
"soft-affinity" rather than the strict partitioning implemented
currently.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
2024-06-11 10:34:53 -07:00
David Vernet
1dbf874709
Merge pull request #341 from vax-r/rusty_data_races
scx_rusty: Elimate data races possibility for domain min_vruntime
2024-06-11 12:04:40 -05:00