Commit Graph

831 Commits

Author SHA1 Message Date
Anthony Ruhier
361bd77642
Document the needed BPF kernel config 2024-05-30 19:07:44 +02:00
Andrea Righi
18b902f7ab
Merge pull request #320 from sched-ext/rustland-core-libc-musl
scx_rustland_core: fix build error with musl
2024-05-29 03:23:29 +02:00
Tejun Heo
ebae7d5e6a
Merge pull request #312 from sched-ext/htejun/layered-updates
scx_layered: Improve affn_viol handling and implement dump method
2024-05-28 10:22:31 -10:00
Tejun Heo
d3ed4cb5c7 scx_layered: Successfully consuming from HI_FALLBACK_DSQ should terminate dispatching
layered_dispatch() was incorrectly continuing down to the lower priority
DSQs after successfully consuming from HI_FALLBACK_DSQ which can lead to
latency issues. Fix it.
2024-05-28 10:20:55 -10:00
Changwoo Min
4ac5da9717
Merge pull request #321 from sched-ext/revert-318-Memory_barrier
Revert "scx_lavd: Enforce memory barrier in flip_sys_cpu_util"
2024-05-27 12:20:18 +09:00
Changwoo Min
4c0f996ddc
Revert "scx_lavd: Enforce memory barrier in flip_sys_cpu_util" 2024-05-27 12:19:21 +09:00
Andrea Righi
4503c7080a scx_rustland_core: fix build error with musl
As reported in #319, we may get a build failure in presence of musl,
that requires additional parameters in sched_param.

Fix by adding a proper conditional to support both gnu libc and musl
libc.

This fixes #319.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-26 22:30:04 +02:00
Tejun Heo
66e9141e67
Merge pull request #316 from frelon/add-opensuse-install
Add openSUSE installation notes
2024-05-26 08:27:50 -10:00
Changwoo Min
0371ccae40
Merge pull request #318 from vax-r/Memory_barrier
scx_lavd: Enforce memory barrier in flip_sys_cpu_util
2024-05-26 21:00:25 +09:00
I Hsin Cheng
f839106a57 scx_lavd: Enforce memory barrier in flip_sys_cpu_util
Use the GNU built-in __sync_fetch_and_xor() to perform the XOR operation
on global variable "__sys_cpu_util_idx" to ensure the operations
visibility.

The built-in function "__sync_fetch_and_xor()" can provide both atomic
operation and full memory barrier which is needed by every operation
(especially store operation) on global variables.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-05-26 15:27:10 +08:00
Fredrik Lönnegren
888172f432
Add openSUSE installation notes
Adds a section how to install scx and sched-ext patched kernel on
openSUSE Tumbleweed.

Signed-off-by: Fredrik Lönnegren <fredrik.lonnegren@suse.com>
2024-05-24 14:50:26 +02:00
Tejun Heo
c09bc2ac69
Merge pull request #314 from vax-r/backward_compat
scx_central: Provide backward compability
2024-05-23 22:05:09 -10:00
I Hsin Cheng
5881c61a5e scx_central: Provide backward compability
Newer sched_ext kernel versions sets the scheduler to schedule all tasks
within the system by default. However, some users are using the old
versions of kernel.

Therefore we call "__COMPAT_scx_bpf_switch_all()" to move all tasks to
"SCHED_EXT" class so scx_central would schedule all tasks by default in
older kernels.
2024-05-24 15:12:34 +08:00
Tejun Heo
99eb56b6b5 scx_layered: Implement layered_dump()
which dumps layer states.
2024-05-23 12:54:17 -10:00
Tejun Heo
a576242b69 scx_layered: Open and grouped layers can handle tasks with custom affinities
The main reason why custom affinities are tricky for scx_layered is because
if we put a task which doesn't allow all CPUs into a layer's DSQ, it may not
get consumed for an indefinite amount of time. However, this is only true
for confined layers. Both open and grouped layers always consumed from all
CPUs and thus don't have this risk.

Let's allow tasks with custom affinities in open and grouped layers.

- In select_cpu(), don't consider direct dispatching to a local DSQ as
  affinity violation even if the target CPU is outside the layer's cpumask
  if the layer is open.

- In enqueue(), separate out per-cpu kthread special case into its own
  block. Note that this is only applied if the layer is not preempting as a
  preempting layer has a higher priority than HI_FALLBACK_DSQ anyway.

- Trigger the LO_FALLBACK_DSQ path for other threads only if the layer is
  confined.

- The preemption path now also runs for tasks with a custom affinity in open
  and grouped layers. Update it so that it only considers the CPUs in the
  preempting task's allowed cpumask.

(cherry picked from commit 82d2f887a4608de61ddf5e15643c10e504a88f7b)
2024-05-23 12:54:17 -10:00
Tejun Heo
1ce23760b5 scx_layered: Improve affinity violation handling
- AFFN_VIOL for per-cpu tasks could be double counted. Once in select_cpu()
  and again in enqueue(). Count in select_cpu() only when direct
  dispatching.

- Violating tasks were prioritized over non-violating ones because they were
  queued on SCX_DSQ_GLOBAL which has priority over all user DSQs. This
  doesn't make sense. Let's introduce two fallback DSQs - HI_FALLBACK_DSQ
  and LO_FALLBACK_DSQ. HI is used for violating kthreads and LO for
  violating user threads. HI is dispatched after preempting layers and LO
  after all other layers. This shouldn't change the behavior too much for
  kthreads while punshing, rather than rewarding, violating user threads.

(cherry picked from commit 67f69645667ba8a155cae9a9b7e90c055d39e23c)
2024-05-23 12:54:17 -10:00
Tejun Heo
7d4243a59d
Merge pull request #311 from jordalgo/c-scheds-fedora
Update INSTALL.md with fedora c scheds info
2024-05-23 09:06:55 -10:00
Jordan Rome
fcf872067a Update INSTALL.md with fedora c scheds info
Also add links to the fedora rpms.
2024-05-23 07:51:39 -07:00
Andrea Righi
1bdc8bd37d
Merge pull request #310 from RinHizakura/fix_scx
Reduce MAX_ENQUEUED_TASKS to fit percpu allocator
2024-05-23 16:46:46 +02:00
Yiwei Lin
8c2236770d Reduce MAX_ENQUEUED_TASKS to fit percpu allocator
The setting of ops->dispatch_max_batch leads to a too large allocated
size for percpu allocator, and it will be unhappy if we want a size
larger than PCPU_MIN_UNIT_SIZE. Reduce MAX_ENQUEUED_TASKS for fix.
2024-05-23 22:33:02 +08:00
Andrea Righi
3e2e581094
Merge pull request #307 from sched-ext/rustland-improve-audio
scx_rustland: improve audio workload and performance predictability
2024-05-23 07:05:05 +02:00
Andrea Righi
4791d862f5 scx_rustland_core: second chance CPU migration
Implement a second-chance migration in select_cpu(): after a task has
been dispatched directly do not try to migrate it immediately on a
different CPU, but force it to stay on prev_cpu for another round.

This seems quite effective on certain architectures (such as on a system
with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide
noticeable benefits with gaming or WebGL applications (such as
https://webglsamples.org/aquarium/aquarium.html) under regular workload
conditions (around +5% fps).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 21:38:34 +02:00
Andrea Righi
23b0bb5ff5 scx_rustland: dispatch interactive tasks on any CPU
Dispatch non-interactive tasks on the CPU selected by the built-in idle
selection logic and allow interactive tasks to be dispatched on any CPU.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 12:12:55 +02:00
Andrea Righi
6c129e9b4b scx_rustland_core: consume from the shared DSQ before local DSQ
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 12:12:55 +02:00
Andrea Righi
3be3b91c29 scx_rustland: assign effective time slice to all tasks
Do not always assign the maximum time slice to interactive tasks, but
use the same value of the dynamic time slice for everyone.

This seems to prevent potential audio cracking when the system is over
commissioned.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 12:12:55 +02:00
Andrea Righi
1fecc8dafa
Merge pull request #306 from sched-ext/rustland-fix-fifo-mode-pr
scx_rustland: fix fifo mode support
2024-05-22 10:04:13 +02:00
Andrea Righi
cca84479f8 scx_rustland: ignore built-in selection logic with --full-user
The option --full-user is provided to delegate *all* scheduling
decisions to the user-space scheduler with no exception, including the
idle selection logic.

Therefore, make this option incompatible with --builtin-idle and
completely bypass the built-in idle selection logic when running in
full-user mode.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:02:02 +02:00
Andrea Righi
9e4bea4a1c scx_rustland_core: switch to FIFO when system is underutilized
Provide a knob in scx_rustland_core to automatically turn the scheduler
into a simple FIFO when the system is underutilized.

This choice is based on the assumption that, in the case of system
underutilization (less tasks running than the amount of available CPUs),
the best scheduling policy is FIFO.

With this option enabled the scheduler starts in FIFO mode. If most of
the CPUs are busy (nr_running >= num_cpus - 1), the scheduler
immediately exits from FIFO mode and starts to apply the logic
implemented by the user-space component. Then the scheduler can switch
back to FIFO if there are no tasks waiting to be scheduled (evaluated
using a moving average).

This option can be enabled/disabled by the user-space scheduler using
the fifo_sched parameter in BpfScheduler: if set, the BPF component will
periodically check for system utilization and switch back and forth to
FIFO mode based on that.

This allows to improve performance of workloads that are using a small
amount of the available CPUs in the system, while still maintaining the
same good level of performance for interactive tasks when the system is
over commissioned.

In certain video games, such as Baldur's Gate 3 or Counter-Strike 2,
running in "normal" system conditions, we can experience a boost in fps
of approximately 4-8% with this change applied.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:02:02 +02:00
Andrea Righi
912bde6a52 scx_rustland_core: simplify CPU selection logic
Simplify the CPU idle selection logic relying on the built-in logic.

If something can be improved in this logic it should be done in the
backend, changing the default idle selection logic, rustland doesn't
need to do anything special here for now.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:02:02 +02:00
Andrea Righi
0d75c80587 Revert "Merge pull request #305 from sched-ext/rustland-fifo-mode"
This merge included additional commits that were supposed to be included
in a separate pull request and have nothing to do with the fifo-mode
changes.

Therefore, revert the whole pull request and create a separate one with
the correct list of commits required to implement this feature.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-22 09:00:25 +02:00
Andrea Righi
e79ab40b9c
Merge pull request #305 from sched-ext/rustland-fifo-mode
scx_rustland: introduce fifo mode
2024-05-22 00:06:31 +02:00
Andrea Righi
f27e67dfb8 scx_rustland_core: second chance CPU migration
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 18:08:58 +02:00
Andrea Righi
f38d91bf29 scx_rustland: dispatch interactive tasks on any CPU
Dispatch non-interactive tasks on the CPU selected by the built-in idle
selection logic and allow interactive tasks to be dispatched on any CPU.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 18:08:43 +02:00
Andrea Righi
778ee1406f scx_rustland_core: consume from the shared DSQ before local DSQ
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 18:08:19 +02:00
Andrea Righi
6901ddb150 scx_rustland: assign effective time slice to all tasks
Do not always assign the maximum time slice to interactive tasks, but
use the same value of the dynamic time slice for everyone.

This seems to prevent potential audio cracking when the system is over
commissioned.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 18:08:12 +02:00
Andrea Righi
d25675ff44 scx_rustland_core: switch to FIFO when system is underutilized
Provide a knob in scx_rustland_core to automatically turn the scheduler
into a simple FIFO when the system is underutilized.

This choice is based on the assumption that, in the case of system
underutilization (less tasks running than the amount of available CPUs),
the best scheduling policy is FIFO.

With this option enabled the scheduler starts in FIFO mode. If most of
the CPUs are busy (nr_running >= num_cpus - 1), the scheduler
immediately exits from FIFO mode and starts to apply the logic
implemented by the user-space component. Then the scheduler can switch
back to FIFO if there are no tasks waiting to be scheduled (evaluated
using a moving average).

This option can be enabled/disabled by the user-space scheduler using
the fifo_sched parameter in BpfScheduler: if set, the BPF component will
periodically check for system utilization and switch back and forth to
FIFO mode based on that.

This allows to improve performance of workloads that are using a small
amount of the available CPUs in the system, while still maintaining the
same good level of performance for interactive tasks when the system is
over commissioned.

In certain video games, such as Baldur's Gate 3 or Counter-Strike 2,
running in "normal" system conditions, we can experience a boost in fps
of approximately 4-8% with this change applied.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 17:39:11 +02:00
Andrea Righi
3ad634c293 scx_rustland_core: simplify CPU selection logic
Simplify the CPU idle selection logic relying on the built-in logic.

If something can be improved in this logic it should be done in the
backend, changing the default idle selection logic, rustland doesn't
need to do anything special here for now.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-21 17:08:06 +02:00
Tejun Heo
33e7864713
Merge pull request #304 from vax-r/Comment_error
scx_flatcg: Correct content error in comment
2024-05-20 20:11:15 -10:00
I Hsin Cheng
e605b067c6 scx_flatcg: Correct content error in comment
A's share in the hierarchy should be 100/(200+100), plus 200/(200+100)
doesn't equal to 1/3. Correct the mistake by changing "200" to "100".
2024-05-21 13:27:26 +08:00
Andrea Righi
a835ab0402
Merge pull request #299 from sched-ext/rustland-cleanups
scx_rustland: cleanups
2024-05-20 18:50:30 +02:00
Tejun Heo
eeb2151804
Merge pull request #300 from sirlucjan/pacman-hooks-openrc
Add pacman hooks for openrc
2024-05-20 06:47:25 -10:00
Tejun Heo
c4bed3602f
Merge pull request #301 from sirlucjan/libbpf-werror
meson: fix a build error for libbpf at release build
2024-05-20 06:46:15 -10:00
Tejun Heo
0181df54b5
Merge pull request #303 from sched-ext/simple_comment
simple: Add comment explaining use of SHARED_DSQ
2024-05-20 06:45:13 -10:00
Tejun Heo
fcc7c37678
Merge pull request #298 from sched-ext/simple-switch-all
scx_simple: re-add __COMPAT_scx_bpf_switch_all()
2024-05-20 06:44:14 -10:00
David Vernet
aba128a65b
Merge pull request #302 from sched-ext/werror_nondefault
Build: Make -Werror non-default
2024-05-20 09:00:31 -05:00
David Vernet
0dda4badd5
simple: Add comment explaining use of SHARED_DSQ
scx_simple is a basic scheduler that does either basic vtime, or global
FIFO, scheduling. At first glance, it may be confusing why we create a
separate DSQ rather than just using SCX_DSQ_GLOBAL. Let's add a comment
explaining the reason for this, so that users that are going over
scx_simple as an example scheduler don't get confused.

Signed-off-by: David Vernet <void@manifault.com>
2024-05-20 08:48:31 -05:00
David Vernet
4b052c1489
Build: Make -Werror non-default
It seems that libbpf and bpftool have some places where -Werror triggers
when compiled with gcc. Let's make it non-default. Users can always
enable it locally if they want to by invoking `meson setup` with the
--werror flag.

Signed-off-by: David Vernet <void@manifault.com>
2024-05-20 08:30:30 -05:00
Piotr Gorski
2bad1dd58e
meson: fix a build error for libbpf at release build
Fix error:
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/lucjan/Pobrane/scx-scheds-git/src/scx/build
ninja: Entering directory `/home/lucjan/Pobrane/scx-scheds-git/src/scx/build'
[2/31] Generating bpftool_target with a custom command
FAILED: bpftool_dummy.c.__PHONY__
/bin/bash /home/lucjan/Pobrane/scx-scheds-git/src/scx/meson-scripts/build_bpftool /usr/bin/jq /usr/bin/make /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src 4
libbpf.c: In function ‘bpf_object__gen_loader’:
libbpf.c: In function ‘bpf_object__gen_loader’:
libbpf.c:8994:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
 8994 |         gen = calloc(sizeof(*gen), 1);
      |                            ^
libbpf.c:8994:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
 8994 |         gen = calloc(sizeof(*gen), 1);
      |                            ^
libbpf.c:8994:28: note: earlier argument should specify number of elements, later size of each element
libbpf.c:8994:28: note: earlier argument should specify number of elements, later size of each element
At top level:
cc1: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make[1]: *** [Makefile:134: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/bootstrap/libbpf/staticobjs/libbpf.o] Error 1
make: *** [Makefile:52: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/bootstrap/libbpf/libbpf.a] Error 2
make: *** Waiting for unfinished jobs....
At top level:
cc1: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make[1]: *** [Makefile:134: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/libbpf/staticobjs/libbpf.o] Error 1
make: *** [Makefile:44: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/libbpf/libbpf.a] Error 2
ninja: build stopped: subcommand failed.

Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-05-20 10:03:48 +02:00
Piotr Gorski
951a3d66bc
Add pacman hooks for openrc
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-05-20 08:32:05 +02:00
Andrea Righi
9a2cc6be50 scx_rustland: report nr_running metric to stdout
Report the amount of running tasks to stdout. This value also represents
the amount of active CPUs that are currently executing a task.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-20 05:20:46 +02:00