Commit Graph

299 Commits

Author SHA1 Message Date
Andrea Righi
48bba8e4f6 scx_userland: survive to dispatch failures
If the scheduler fails to dispatch a task we immediately give up,
exiting with an error like the following:

 Failed to dispatch task 251 in 1
 EXIT: BPF scheduler unregistered

This scenario can be simulated decreasing dramatically the value of
MAX_ENQUEUED_TASKS.

We can make the scheduler a little more robust simply by re-adding the
task that cannot be dispatched to vruntime_head and stop dispatching
additional tasks in the same batch.

This can give enough room, under such "dispatch overload" condition, to
catch up and resume the normal execution without crashing.

Moreover, introduce nr_vruntime_failed to report failed dispatch events
in the scheduler's statistics.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-13 22:19:36 +01:00
David Vernet
a68885f92f
Merge pull request #29 from arighi/scx-userland-pid-max
scx_userland: allocate tasks array based on kernel.pid_max
2023-12-13 10:57:03 -06:00
Andrea Righi
1e9e6778bc scx_userland: allocate tasks array based on kernel.pid_max
Currently the array of enqueued tasks is statically allocated to a fixed
size of USERLAND_MAX_TASKS to avoid potential deadlocks that could be
introduced by performing dynamic allocations in the enqueue path.

However, this also adds a limit on the maximum pid that the scheduler
can handle, since the pid is used as the index to access the array.

In fact, it is quite easy to trigger the following failure on an average
desktop system (making this scheduler pretty much unusable in such
scenario):

 $ sudo scx_userland
 ...
 Failed to enqueue task 33258: No such file or directory
 EXIT: BPF scheduler unregistered

Prevent this by using sysctl's kernel.pid_max as the size of the tasks
array (and still allocate it all at once during initialization).

The downside of this change is that scx_userland may require additional
memory to start and in small systems it could even trigger OOMs. For
this reason add an explicit message to the command help, suggesting to
reduce kernel.pid_max in case of OOM conditions.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-13 17:33:10 +01:00
Tejun Heo
ca05e28dde
Merge pull request #28 from sched-ext/htejun
Bump versions and add LICENSE symlinks for scx_layered and scx_rusty
2023-12-12 11:22:13 -10:00
Tejun Heo
8a07bcc31b Bump versions and add LICENSE symlinks for scx_layered and scx_rusty 2023-12-12 11:21:08 -10:00
Tejun Heo
35fdfdffb0
Merge pull request #27 from davide125/license
rust: clarify license and include text
2023-12-12 11:14:50 -10:00
Davide Cavalca
21e468a491 rust: clarify license and include text 2023-12-12 13:02:13 -08:00
Tejun Heo
fbb0164454
Merge pull request #26 from kkdwivedi/central-fix-nr-slots
scx_central: Break dispatch_to_cpu loop when running out of buffer slots
2023-12-11 21:57:30 -10:00
Kumar Kartikeya Dwivedi
c4c994c9ce
scx_central: Break dispatch_to_cpu loop when running out of buffer slots
For the case where many tasks being popped from the central queue cannot
be dispatched to the local DSQ of the target CPU, we will keep bouncing
them to the fallback DSQ and continue the dispatch_to_cpu loop until we
find one which can be dispatch to the local DSQ of the target CPU.

In a contrived case, it might be so that all tasks pin themselves to
CPUs != target CPU, and due to their affinity cannot be dispatched to
that CPU's local DSQ. If all of them are filling up the central queue,
then we will keep looping in the dispatch_to_cpu loop and eventually run
out of slots for the dispatch buffer. The nr_mismatched counter will
quickly rise and sched-ext will notice the error and unload the BPF
scheduler.

To remedy this, ensure that we break the dispatch_to_cpu loop when we
can no longer perform a dispatch operation. The outer loop in
central_dispatch for the central CPU should ensure the loop breaks when
we run out of these slots and schedule a self-IPI to the central core,
and allow sched-ext to consume the dispatch buffer before restarting the
dispatch loop again.

A basic way to reproduce this scenario is to do:
taskset -c 0 perf bench sched messaging

The error in the kernel will be:
sched_ext: BPF scheduler "central" errored, disabling
sched_ext: runtime error (dispatch buffer overflow)
bpf_prog_6a473147db3cec67_dispatch_to_cpu+0xc2/0x19a
bpf_prog_c9e51ba75372a829_central_dispatch+0x103/0x1a5

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2023-12-12 07:50:46 +00:00
Tejun Heo
737aa810b1
Merge pull request #25 from jordalgo/readme-ubuntu
Update README for ubuntu instructions
2023-12-11 16:05:25 -10:00
Jordan Rome
df48390d73 Update README for ubuntu instructions 2023-12-11 17:42:08 -08:00
Tejun Heo
872f1d1f1e
Merge pull request #24 from sched-ext/htejun
README: Add ubuntu instructions
2023-12-11 11:04:59 -10:00
Tejun Heo
7ebb102d5a README: David's review 2023-12-11 11:04:40 -10:00
Tejun Heo
feb9018cbe README: Add ubuntu instructions 2023-12-11 10:24:10 -10:00
Tejun Heo
8ea5850967
Merge pull request #23 from arighi/fix-s390-build
build: add Debian arch name mapping for s390
2023-12-10 03:08:56 -10:00
Andrea Righi
1742daa4b5 build: add Debian arch name mapping for s390
Add the proper mapping from Debian architecture "s390x" to the kernel
architecture name "s390".

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-10 12:42:15 +01:00
Tejun Heo
c9d0cc640a
Merge pull request #22 from arighi/enable-rust-build-option
build: introduce enable_rust build option
2023-12-09 15:59:19 -10:00
Tejun Heo
abbb6a0276
Merge pull request #20 from arighi/scx-rusty-fix
scx_rusty: fix "subtract with overflow" error
2023-12-09 15:58:11 -10:00
David Vernet
ab1d894fd2
Merge pull request #21 from arighi/misc-fixes 2023-12-09 09:33:09 -06:00
Andrea Righi
6343bcf360 build: introduce enable_rust build option
Introduce an option to enable/disable the build of all the Rust
sub-projects.

This can be useful to build scx on those systems where Rust is not
fully supported (e.g., armhf).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 15:05:23 +01:00
Andrea Righi
0637b6a0b5 scx_nest: use proper format string for u64 types
This prevents some warnings when building scx_nest on 32-bit
architectures.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:50 +01:00
Andrea Righi
adc01140aa scx_qmap: use proper format string for u64 types
This prevents some warnings when building scx_qmap on 32-bit
architectures.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:44 +01:00
Andrea Righi
4df979ccb7 scx_pair: use proper format string for u64 types
This prevents some warnings when building scx_pair on 32-bit
architectures.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:38 +01:00
Andrea Righi
14e70fd134 scx_flatcg: use proper data size for hweight_gen
We should explicitly use u64 for hweight_gen to prevent the following
build failures on 32-bit architectures:

scheds/kernel-examples/scx_flatcg.p/scx_flatcg.bpf.skel.h: In function ‘scx_flatcg__assert’:
scheds/kernel-examples/scx_flatcg.p/scx_flatcg.bpf.skel.h:3523:9: error: static assertion failed: "unexpected size of \'hweight_gen\'"
 3523 |         _Static_assert(sizeof(s->data->hweight_gen) == 8, "unexpected size of 'hweight_gen'");

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:30 +01:00
Andrea Righi
00c5d2dfb7 scx_qmap: use proper data size for scheduler stats
We should explicitly use u64 for scheduler statistics to prevent the
following build failures on 32-bit architectures:

scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h: In function ‘scx_qmap__assert’:
scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2560:9: error: static assertion failed: "unexpected size of \'nr_enqueued\'"
 2560 |         _Static_assert(sizeof(s->bss->nr_enqueued) == 8, "unexpected size of 'nr_enqueued'");
      |         ^~~~~~~~~~~~~~
scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2561:9: error: static assertion failed: "unexpected size of \'nr_dispatched\'"
 2561 |         _Static_assert(sizeof(s->bss->nr_dispatched) == 8, "unexpected size of 'nr_dispatched'");
      |         ^~~~~~~~~~~~~~
scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2562:9: error: static assertion failed: "unexpected size of \'nr_reenqueued\'"
 2562 |         _Static_assert(sizeof(s->bss->nr_reenqueued) == 8, "unexpected size of 'nr_reenqueued'");
      |         ^~~~~~~~~~~~~~
scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2563:9: error: static assertion failed: "unexpected size of \'nr_dequeued\'"
 2563 |         _Static_assert(sizeof(s->bss->nr_dequeued) == 8, "unexpected size of 'nr_dequeued'");
      |         ^~~~~~~~~~~~~~
scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2564:9: error: static assertion failed: "unexpected size of \'nr_core_sched_execed\'"
 2564 |         _Static_assert(sizeof(s->bss->nr_core_sched_execed) == 8, "unexpected size of 'nr_core_sched_execed'");
      |         ^~~~~~~~~~~~~~

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:25 +01:00
Andrea Righi
4c65e71c48 scx_central: use proper format string for u64
When printing scheduler statistics we use %lu to print u64 values, that
works well on 64-bit architectures, but on 32-bit architectures we get
errors like the following:

  106 |                 printf("total   :%10lu    local:%10lu   queued:%10lu  lost:%10lu\n",
      |                                  ~~~~^
      |                                      |
      |                                      long unsigned int
      |                                  %10llu
  107 |                        skel->bss->nr_total,
      |                        ~~~~~~~~~~~~~~~~~~~
      |                                 |
      |                                 u64 {aka long long unsigned int}

Fix this by using the proper format %llu.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:20 +01:00
Andrea Righi
e396f1e467 scx_userland: get rid of strings.h include
Use compiler's built-in stack initialization instead of memset().

In this way we can get rid of the string.h include and make
cross-compilation easier in certain small environments (i.e., arm).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:49:14 +01:00
Andrea Righi
c5d1bc3577 scx_rusty: fix "subtract with overflow" error
It seems that under certain conditions, the difference between the
current and the previous procfs::CpuStat values may become negative,
triggering the following crash/trace:

thread 'main' panicked at /build/rustc-VvCkKl/rustc-1.73.0+dfsg0ubuntu1/library/core/src/ops/arith.rs:217:1:
attempt to subtract with overflow
stack backtrace:
...
  19:     0x590d8481909e - scx_rusty::calc_util::h46f2af9c512c2ecd
                               at /home/arighi/src/scx/scheds/rust-user/scx_rusty/src/main.rs:217:31
  20:     0x590d8481c794 - scx_rusty::Tuner::step::h2e51076f043a8593
                               at /home/arighi/src/scx/scheds/rust-user/scx_rusty/src/main.rs:444:38
  21:     0x590d84828270 - scx_rusty::Scheduler::run::hb5483f1e585f52fe
                               at /home/arighi/src/scx/scheds/rust-user/scx_rusty/src/main.rs:1198:17
  22:     0x590d848289e9 - scx_rusty::main::h9ba8c62ad33aeee1
...

Prevent this by introducing a sub_or_zero() helper function that returns
zero if the difference is negative.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-09 14:47:35 +01:00
Tejun Heo
330e104eae
Merge pull request #18 from sched-ext/num_schedulings
scx_nest: Reset schedulings when a task is dispatched
2023-12-08 09:18:59 -10:00
David Vernet
c953ee47a6
scx_nest: Reset schedulings when a task is dispatched
In scx_nest, we currently count the number of times that a core is
scheduled for compaction before we eventually just eagerly compact the
core. The idea is that the core could thrash between being scheduled and
then "de-scheduled" for compaction if there are a couple of tasks that
are bouncing between cores in the primary nest often enough to kick them
out of being compacted.

We're currently resetting schedulings when a core is eagerly compacted,
but to be precise we should probably also reset the count when a core
consumes a task from the fallback DSQ, at this indicates that the system
is overcommitted and that we likely won't benefit from compacting the
primary nest.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-08 13:16:40 -06:00
Tejun Heo
0b3ce4dfef
Merge pull request #17 from sched-ext/htejun
Support offline compilation
2023-12-08 08:46:04 -10:00
Tejun Heo
9e12238d64 Support offline compilation 2023-12-08 08:45:44 -10:00
Tejun Heo
82c5c09514
Merge pull request #16 from jordalgo/readme-meson-min
Update readme with alt meson install instructions
2023-12-08 06:40:16 -10:00
Jordan Rome
cad301238a Update readme with alt meson install instructions 2023-12-08 04:50:42 -08:00
Tejun Heo
d131502582
Merge pull request #15 from sched-ext/htejun
Bump overall version to 0.1.1
2023-12-07 13:12:35 -10:00
Tejun Heo
d8c2321831 Bump overall version to 0.1.1
- scx_nest included.
- scx_rusty bug fix.
- Ubuntu build fixes.
2023-12-07 13:11:28 -10:00
Tejun Heo
bdf722893b
Merge pull request #14 from sched-ext/htejun
scx_utils: Bump version to 0.3.2
2023-12-07 13:10:56 -10:00
Tejun Heo
3acdef8f2b scx_utils: Bump version to 0.3.2
- Build fix for ubuntu clang.
2023-12-07 13:08:21 -10:00
Tejun Heo
b0bf1639fa
Merge pull request #13 from arighi/fix-clang-version
scx_utils::BpfBuilder: properly detect clang version in Ubuntu
2023-12-07 12:28:46 -10:00
Andrea Righi
7ceccf6516 scx_utils::BpfBuilder: properly detect clang version in Ubuntu
Apply the same logic of commit 00cd15a ("build: properly detect clang
version in Ubuntu") in scx_utils as well.

This allows to build scx_utils properly in Ubuntu.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-07 23:26:21 +01:00
Tejun Heo
11c6a809b2
Merge pull request #12 from sched-ext/scx_nest
scx_nest: Add scx_nest scheduler
2023-12-07 12:19:40 -10:00
Tejun Heo
cd6d45d0e1
Merge pull request #9 from sched-ext/rusty
rusty: Fix calc_util() in rusty
2023-12-07 12:17:27 -10:00
David Vernet
ca21842908
scx_nest: Add scx_nest scheduler
The scx_nest scheduler seems to be behaving well. Let's merge it to the
scx repo so that CachyOS can package and use it more easily.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-07 13:28:09 -06:00
David Vernet
5751f1c2a3
Merge pull request #11 from arighi/fix-clang-version-regexp
build: properly detect clang version in Ubuntu
2023-12-07 12:32:38 -06:00
Andrea Righi
00cd15a3ae build: properly detect clang version in Ubuntu
Some distro may add their own prefix to the version string of clang, for
example in Ubuntu:

 $ clang --version
 Ubuntu clang version 17.0.5 (1ubuntu1)
 ...

That triggers the following meson error during the setup phase:

 meson.build:25:44: ERROR: String '' cannot be converted to int

Change the regexp used to evaluate the clang version to avoid this
build failure.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-07 19:24:12 +01:00
David Vernet
b53e8251a1
rusty: Fix calc_util() in rusty
We were assigning curr to prev stats, and vice versa, in calc_util().
This was causing the following crash on debug builds:

[void@maniforge scheds]$ sudo RUST_BACKTRACE=1 scx_rusty
00:00:56 [INFO] CPUs: online/possible = 32/32
00:00:56 [INFO] DOM[00] cpumask 0000000000FF00FF (16 cpus)
00:00:56 [INFO] DOM[01] cpumask 00000000FF00FF00 (16 cpus)
00:00:56 [INFO] Rusty Scheduler Attached
thread 'main' panicked at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/ops/arith.rs:217:1:
attempt to subtract with overflow
stack backtrace:
   0: rust_begin_unwind
             at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/std/src/panicking.rs:597:5
   1: core::panicking::panic_fmt
             at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/panicking.rs:127:5
   3: <u64 as core::ops::arith::Sub>::sub
             at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/ops/arith.rs:217:1
   4: <&u64 as core::ops::arith::Sub<&u64>>::sub
             at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/internal_macros.rs:55:17
   5: scx_rusty::calc_util
             at ./rust-user/scx_rusty/src/main.rs:216:29
   6: scx_rusty::Tuner::step
             at ./rust-user/scx_rusty/src/main.rs:444:38
   7: scx_rusty::Scheduler::run
             at ./rust-user/scx_rusty/src/main.rs:1198:17
   8: scx_rusty::main
             at ./rust-user/scx_rusty/src/main.rs:1261:5
   9: core::ops::function::FnOnce::call_once
             at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Flip them to avoid the crash. Rusty now runs fine.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-06 18:25:27 -06:00
David Vernet
e38937b501
Merge pull request #8 from sched-ext/readme
Readme
2023-12-06 17:21:20 -06:00
David Vernet
d9ece9fe87
README: Add link to join scx slack channel
Add a link to join the scx slack channel.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-06 16:55:04 -06:00
David Vernet
eba9155a7f
README: Add scheds/ README's
There's a fairly comprehensive README in the kernel's tools/sched_ext
directory which describes each of the example schedulers. Let's pull it
into this repository, and split it into the various subdirectories
containing the kernele-examples/ schedulers, and the rust-user/
schedulers.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-06 16:55:02 -06:00
David Vernet
086c05eaf5
README: Update a bunch of formatting and grammar
The root README files have some grammatical mistakes, and/or need to be
updated to not be in the context of being sent as a patch set. Update
them.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-06 16:53:21 -06:00