Newer sched_ext kernel versions sets the scheduler to schedule all tasks
within the system by default. However, some users are using the old
versions of kernel.
Therefore we call "__COMPAT_scx_bpf_switch_all()" to move all tasks to
"SCHED_EXT" class so scx_central would schedule all tasks by default in
older kernels.
The main reason why custom affinities are tricky for scx_layered is because
if we put a task which doesn't allow all CPUs into a layer's DSQ, it may not
get consumed for an indefinite amount of time. However, this is only true
for confined layers. Both open and grouped layers always consumed from all
CPUs and thus don't have this risk.
Let's allow tasks with custom affinities in open and grouped layers.
- In select_cpu(), don't consider direct dispatching to a local DSQ as
affinity violation even if the target CPU is outside the layer's cpumask
if the layer is open.
- In enqueue(), separate out per-cpu kthread special case into its own
block. Note that this is only applied if the layer is not preempting as a
preempting layer has a higher priority than HI_FALLBACK_DSQ anyway.
- Trigger the LO_FALLBACK_DSQ path for other threads only if the layer is
confined.
- The preemption path now also runs for tasks with a custom affinity in open
and grouped layers. Update it so that it only considers the CPUs in the
preempting task's allowed cpumask.
(cherry picked from commit 82d2f887a4608de61ddf5e15643c10e504a88f7b)
- AFFN_VIOL for per-cpu tasks could be double counted. Once in select_cpu()
and again in enqueue(). Count in select_cpu() only when direct
dispatching.
- Violating tasks were prioritized over non-violating ones because they were
queued on SCX_DSQ_GLOBAL which has priority over all user DSQs. This
doesn't make sense. Let's introduce two fallback DSQs - HI_FALLBACK_DSQ
and LO_FALLBACK_DSQ. HI is used for violating kthreads and LO for
violating user threads. HI is dispatched after preempting layers and LO
after all other layers. This shouldn't change the behavior too much for
kthreads while punshing, rather than rewarding, violating user threads.
(cherry picked from commit 67f69645667ba8a155cae9a9b7e90c055d39e23c)
The setting of ops->dispatch_max_batch leads to a too large allocated
size for percpu allocator, and it will be unhappy if we want a size
larger than PCPU_MIN_UNIT_SIZE. Reduce MAX_ENQUEUED_TASKS for fix.
Implement a second-chance migration in select_cpu(): after a task has
been dispatched directly do not try to migrate it immediately on a
different CPU, but force it to stay on prev_cpu for another round.
This seems quite effective on certain architectures (such as on a system
with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide
noticeable benefits with gaming or WebGL applications (such as
https://webglsamples.org/aquarium/aquarium.html) under regular workload
conditions (around +5% fps).
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Dispatch non-interactive tasks on the CPU selected by the built-in idle
selection logic and allow interactive tasks to be dispatched on any CPU.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Do not always assign the maximum time slice to interactive tasks, but
use the same value of the dynamic time slice for everyone.
This seems to prevent potential audio cracking when the system is over
commissioned.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The option --full-user is provided to delegate *all* scheduling
decisions to the user-space scheduler with no exception, including the
idle selection logic.
Therefore, make this option incompatible with --builtin-idle and
completely bypass the built-in idle selection logic when running in
full-user mode.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Provide a knob in scx_rustland_core to automatically turn the scheduler
into a simple FIFO when the system is underutilized.
This choice is based on the assumption that, in the case of system
underutilization (less tasks running than the amount of available CPUs),
the best scheduling policy is FIFO.
With this option enabled the scheduler starts in FIFO mode. If most of
the CPUs are busy (nr_running >= num_cpus - 1), the scheduler
immediately exits from FIFO mode and starts to apply the logic
implemented by the user-space component. Then the scheduler can switch
back to FIFO if there are no tasks waiting to be scheduled (evaluated
using a moving average).
This option can be enabled/disabled by the user-space scheduler using
the fifo_sched parameter in BpfScheduler: if set, the BPF component will
periodically check for system utilization and switch back and forth to
FIFO mode based on that.
This allows to improve performance of workloads that are using a small
amount of the available CPUs in the system, while still maintaining the
same good level of performance for interactive tasks when the system is
over commissioned.
In certain video games, such as Baldur's Gate 3 or Counter-Strike 2,
running in "normal" system conditions, we can experience a boost in fps
of approximately 4-8% with this change applied.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Simplify the CPU idle selection logic relying on the built-in logic.
If something can be improved in this logic it should be done in the
backend, changing the default idle selection logic, rustland doesn't
need to do anything special here for now.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
This merge included additional commits that were supposed to be included
in a separate pull request and have nothing to do with the fifo-mode
changes.
Therefore, revert the whole pull request and create a separate one with
the correct list of commits required to implement this feature.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Dispatch non-interactive tasks on the CPU selected by the built-in idle
selection logic and allow interactive tasks to be dispatched on any CPU.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The shared DSQ is typically used to prioritize tasks and dispatch them
on the first CPU available, so consume from the shared DSQ before the
local CPU DSQ.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Do not always assign the maximum time slice to interactive tasks, but
use the same value of the dynamic time slice for everyone.
This seems to prevent potential audio cracking when the system is over
commissioned.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Provide a knob in scx_rustland_core to automatically turn the scheduler
into a simple FIFO when the system is underutilized.
This choice is based on the assumption that, in the case of system
underutilization (less tasks running than the amount of available CPUs),
the best scheduling policy is FIFO.
With this option enabled the scheduler starts in FIFO mode. If most of
the CPUs are busy (nr_running >= num_cpus - 1), the scheduler
immediately exits from FIFO mode and starts to apply the logic
implemented by the user-space component. Then the scheduler can switch
back to FIFO if there are no tasks waiting to be scheduled (evaluated
using a moving average).
This option can be enabled/disabled by the user-space scheduler using
the fifo_sched parameter in BpfScheduler: if set, the BPF component will
periodically check for system utilization and switch back and forth to
FIFO mode based on that.
This allows to improve performance of workloads that are using a small
amount of the available CPUs in the system, while still maintaining the
same good level of performance for interactive tasks when the system is
over commissioned.
In certain video games, such as Baldur's Gate 3 or Counter-Strike 2,
running in "normal" system conditions, we can experience a boost in fps
of approximately 4-8% with this change applied.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Simplify the CPU idle selection logic relying on the built-in logic.
If something can be improved in this logic it should be done in the
backend, changing the default idle selection logic, rustland doesn't
need to do anything special here for now.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
scx_simple is a basic scheduler that does either basic vtime, or global
FIFO, scheduling. At first glance, it may be confusing why we create a
separate DSQ rather than just using SCX_DSQ_GLOBAL. Let's add a comment
explaining the reason for this, so that users that are going over
scx_simple as an example scheduler don't get confused.
Signed-off-by: David Vernet <void@manifault.com>
It seems that libbpf and bpftool have some places where -Werror triggers
when compiled with gcc. Let's make it non-default. Users can always
enable it locally if they want to by invoking `meson setup` with the
--werror flag.
Signed-off-by: David Vernet <void@manifault.com>
Fix error:
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/lucjan/Pobrane/scx-scheds-git/src/scx/build
ninja: Entering directory `/home/lucjan/Pobrane/scx-scheds-git/src/scx/build'
[2/31] Generating bpftool_target with a custom command
FAILED: bpftool_dummy.c.__PHONY__
/bin/bash /home/lucjan/Pobrane/scx-scheds-git/src/scx/meson-scripts/build_bpftool /usr/bin/jq /usr/bin/make /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src 4
libbpf.c: In function ‘bpf_object__gen_loader’:
libbpf.c: In function ‘bpf_object__gen_loader’:
libbpf.c:8994:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
8994 | gen = calloc(sizeof(*gen), 1);
| ^
libbpf.c:8994:28: error: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Werror=calloc-transposed-args]
8994 | gen = calloc(sizeof(*gen), 1);
| ^
libbpf.c:8994:28: note: earlier argument should specify number of elements, later size of each element
libbpf.c:8994:28: note: earlier argument should specify number of elements, later size of each element
At top level:
cc1: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make[1]: *** [Makefile:134: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/bootstrap/libbpf/staticobjs/libbpf.o] Error 1
make: *** [Makefile:52: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/bootstrap/libbpf/libbpf.a] Error 2
make: *** Waiting for unfinished jobs....
At top level:
cc1: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make[1]: *** [Makefile:134: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/libbpf/staticobjs/libbpf.o] Error 1
make: *** [Makefile:44: /home/lucjan/Pobrane/scx-scheds-git/src/scx/build/bpftool/src/libbpf/libbpf.a] Error 2
ninja: build stopped: subcommand failed.
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
Report the amount of running tasks to stdout. This value also represents
the amount of active CPUs that are currently executing a task.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Although newer kernels default to switching-all, some users might still
be using the scheduler with older kernels.
Therefore, ensure all tasks are moved to the SCHED_EXT class by calling
__COMPAT_scx_bpf_switch_all() during init, so that scx_simple can still
operate on these older kernels as well.
Fixes: cf66e58 ("Sync from kernel (670bdab6073)")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Add '-Wno-maybe-uninitialized' option since gcc spits a warning (as an
error) at release build. Following is the error message without the
option.
❯ meson setup build -Dbuildtype=release --prefix ~
❯ meson compile -C build --jobs=$(nproc)
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/changwoo/ws-multics69/dev/scx-github/build -j 16
ninja: Entering directory `/home/changwoo/ws-multics69/dev/scx-github/build'
[3/40] Generating libbpf with a custom command
FAILED: cc_cflags_probe.c.__PHONY__
/home/changwoo/ws-multics69/dev/scx-github/meson-scripts/build_libbpf /usr/bin/jq /usr/bin/make /home/changwoo/ws-multics69/dev/scx-github/build/libbpf/src 16
In function ‘elf_close’,
inlined from ‘elf_close’ at elf.c:53:6,
inlined from ‘elf_find_func_offset_from_file’ at elf.c:384:2:
elf.c:57:9: error: ‘elf_fd.elf’ may be used uninitialized [-Werror=maybe-uninitialized]
57 | elf_end(elf_fd->elf);
| ^~~~~~~~~~~~~~~~~~~~
elf.c: In function ‘elf_find_func_offset_from_file’:
elf.c:377:23: note: ‘elf_fd.elf’ was declared here
377 | struct elf_fd elf_fd;
| ^~~~~~
In function ‘elf_close’,
inlined from ‘elf_close’ at elf.c:53:6,
inlined from ‘elf_find_func_offset_from_file’ at elf.c:384:2:
elf.c:58:9: error: ‘elf_fd.fd’ may be used uninitialized [-Werror=maybe-uninitialized]
58 | close(elf_fd->fd);
| ^~~~~~~~~~~~~~~~~
elf.c: In function ‘elf_find_func_offset_from_file’:
elf.c:377:23: note: ‘elf_fd.fd’ was declared here
377 | struct elf_fd elf_fd;
| ^~~~~~
At top level:
cc1: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make: *** [Makefile:133: staticobjs/elf.o] Error 1
make: *** Waiting for unfinished jobs....
libbpf.c: In function ‘bpf_map__init_kern_struct_ops’:
libbpf.c:1107:18: error: ‘mod_btf’ may be used uninitialized [-Werror=maybe-uninitialized]
1107 | kern_btf = mod_btf ? mod_btf->btf : obj->btf_vmlinux;
| ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libbpf.c:1092:28: note: ‘mod_btf’ was declared here
1092 | struct module_btf *mod_btf;
| ^~~~~~~
In function ‘find_struct_ops_kern_types’,
inlined from ‘bpf_map__init_kern_struct_ops’ at libbpf.c:1100:8:
libbpf.c:980:21: error: ‘btf’ may be used uninitialized [-Werror=maybe-uninitialized]
980 | kern_type = btf__type_by_id(btf, kern_type_id);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
libbpf.c: In function ‘bpf_map__init_kern_struct_ops’:
libbpf.c:965:21: note: ‘btf’ was declared here
965 | struct btf *btf;
| ^~~
At top level:
cc1: note: unrecognized command-line option ‘-Wno-unknown-warning-option’ may have been intended to silence earlier diagnostics
cc1: all warnings being treated as errors
make: *** [Makefile:133: staticobjs/libbpf.o] Error 1
[4/40] Generating bpftool_target with a custom command
ninja: build stopped: subcommand failed.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The dynamic slice boost is not used anymore in the code, so there is no
reason to keep evaluating it.
Moreover, using it instead of the static slice boost seems to make
things worse, so let's just get rid of it.
Fixes: 0b3c399 ("scx_rustland: introduce dynamic slice boost")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
scx_rustland has a function called get_cpu_owner() in BPF which
currently has no callers. There's nothing wrong with the function, but
it causes a warning due to an unused function. Let's just annotate it
with __maybe_unused to tell the compiler that it's not a problem.
Signed-off-by: David Vernet <void@manifault.com>
When building with warnings enabled, a few obvious bugs are pointed out:
- We're not correctly calculating waker frequency
- We're not taking the min of avg_run_raw compared to max latency
- We're missing an element from sched_prio_to_weight
Fix these. With these changes, interactivity is seemingly improved. We
go from ~12 sec / turn -> 11 seconds / turn in the Civ 6 AI benchmark
with a 4 x nproc CPU hogging workload in the background. It's clear,
however, that we really need preemption.
Signed-off-by: David Vernet <void@manifault.com>