Commit Graph

49 Commits

Author SHA1 Message Date
Andrea Righi
addae505df ci: enable lockdep in the virtme-ng kernel config
Set CONFIG_PROVE_LOCKING=y in the defalut virtme-ng config, to enable
lockdep and additional lock debugging features, which can help catch
lock-related kernel issues in advance.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-26 13:57:37 +02:00
Pat Somaru
302b3ac6c0
add retries to kernel clone step
add retries to kernel clone step

odds are that like, when we most care about cloning a fresh upstream
(i.e. some fix was just pushed), things are going to be flakiest (i.e.
no caches anywhere), so retry 10 times ~3 minutes total between tries
when trying to clone kernel source.
2024-10-23 19:58:21 -04:00
Andrea Righi
d33e1b02b4 ci: enable SCHED_MC in the virtme-ng kernel config
Enabling SCHED_MC in the kernel used for testing allows us to
potentially run more complext tests, simulating different CPU topologies
and have access to such topology data through the in-kernel scheduler's
information.

This can be useful as we add more topology awareness logic to the
sched_ext core (e.g., in the built-in idle CPU selection policy).

Therefore add this option to the default .config (and also fix a missing
newline at the end of the file).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-23 11:18:02 +02:00
Pat Somaru
d944a39a7f
remove apt fast from ci setup
remove apt fast from ci setup to reduce non-core dependencies
2024-10-17 13:08:03 -04:00
Daniel Hodges
71d63010af scx_layered: Refactor layer iteration
Remove DSQ iter algos.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-14 13:13:53 -07:00
Jake Hillion
52c279a469 layered: make default value for disable_topology dynamic
Disable topology currently defaults to `false` (topology enabled...). Change
this so that topology is enabled by default on hardware that may benefit from
it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit
from it.

This is a slightly noisy change as we have to move ownership of the newly
mutable layer specs into the `Scheduler` object (previously they were a
borrow). We don't have a `Topology` object to make the default decision from
until `Scheduler::init`, and I think this is because of the possibility of hot
plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is
potentially mutable.

Test plan:
- CI. Updated to be explicit about topology in both cases.

Single NUMA multi-LLC machine:
```
$ scx_layered --run-example
...
13:34:01 [INFO] Topology awareness not specified, selecting enabled based on
hardware
...
$ scx_layered --run-example --disable-topology=true
...
13:33:41 [INFO] Disabling topology awareness
...
$ scx_layered --run-example -t
...
13:33:15 [INFO] Disabling topology awareness
...
$ scx_layered --run-example --disable-topology=false
# none of the above messages present
```

Single NUMA single LLC machine:
```
$ scx_layered --run-example
15:33:10 [INFO] Topology awareness not specified, selecting disabled based on
hardware
```
2024-10-11 17:09:07 +01:00
Pat Somaru
5e4a7ac655
setup matrix job to run key paths of layered through verifier/stress test 2024-10-09 21:09:41 -04:00
Pat Somaru
9d463a85e3
have bpf builds use separate cache
have bpf kernel builds use separate cache
(to avoid cache mixups when cherry pick)
2024-10-03 08:20:53 -04:00
Pat Somaru
a715d20d92
update urls to point to bpf-next 2024-10-02 22:20:56 -04:00
Pat Somaru
5182682baf
remove lint step from bpf workflow 2024-10-02 21:27:45 -04:00
Pat Somaru
7e95e58e20
enable periodic build/test against bpf-next
add a ci job to build and test against bpf-next
every 6 hours.
2024-10-02 21:20:14 -04:00
Pat Somaru
49a1e8fb2a
fix artifact names to work with mergequeue naming 2024-09-26 14:33:38 -04:00
Jake Hillion
b89f3fe70b
ci: enable on merge_group action (#698)
Enabling the `caching-build` workflow on `merge_group` actions. This is required to enable merge queues which should make merging lots of PRs while getting full CI testing possible. Currently `lint` is the only required check but we will add more in future.
2024-09-26 17:40:47 +01:00
Pat Somaru
e6e91f8755
do not cache fast jobs dependencies 2024-09-26 11:27:08 -04:00
likewhatevs
bf68679d35
Setup "debugging" and misc cleanup (#695)
* Fix a couple of misc errors in build scripts.
* Tweak scripts/kconfigs to make bpftrace work.
* Update how CI caching works to make builds faster (6 minute turnaround
  time)
* Update CI config to generate per-scheduler debug archives w/ guest
  dmesg/scheduler stdout, guest stdout, bpftrace script output,
  veristat output.

* Update build scripts to accept the following:
** VNG RW -- write to host filesystem (better caching, logging).
* For stress tests in particular (via ini config):
** QEMU Opts -- to facilitate reproducing bugs (i.e. high core count).
** bpftrace scripts -- specify bpftrace scripts to run during stress
tests.
2024-09-26 11:11:10 -04:00
likewhatevs
d8246fdcd6
clean up ci/make ci nicer (#682)
* make ci nicer

Replace build scheds and merged with caching build, and rename
caching build to build-and-test.

This should make the CI reports on PRs be nice and specific
(i.e. at a glance, know what passes and what fails).

It also keeps PR CI jobs up to date (as folks edit things) and
has them all use one config/24.04 etc.

* prevent untar permission errors from causing cache misses
2024-09-24 16:36:56 -04:00
Pat Somaru
878eaf042e
add 'continue on error' to ci jobs
I think I see PRs being harder to write because all parts of a CI job
are cancelled when one fails.

I think I am also starting to see that we have enough largely disjoint
moving pieces that there will often be one that is failing stress
tests at any time.

Make CI run all stress tests always to address this.
2024-09-24 09:17:55 -04:00
Jake Hillion
8ca45cfa37
lint: enable cargo fmt (#643)
Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it.

Test plan:
- CI lint check passes.
2024-09-11 10:03:20 +01:00
patso
28319f3205
migrate ci vm to ubuntu 24.04
migrate ci vm to ubuntu 24.04
2024-09-10 09:53:40 -04:00
patso
2e46f71780
generate docs for scx and kernel
generate docs for scx and kernel and push to gh page

this "adds" kernel scheduler and bpf docs to
the generated scx rust docs.
2024-09-09 15:37:59 -04:00
patso
120211d731
split build and test jobs
split build and test jobs to reduce ci turnaround time
and make it clear what is failing when something fails.

also add virtiofsd to deps to make test compilation faster
(most test time is compliation) and remove all force 9ps.
2024-09-08 02:54:24 -04:00
Andrea Righi
8ff57ac915 ci: fix vng command to set the right amount of CPUs
The right options to set the amount of CPUs in vng is `-p`, not `-c`.
Change the command to use the long format `--cpu` and `--memory` for
better clarity.

This fixes issue #625.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-07 09:47:13 +02:00
patso
eaa636dfcc
set vng cpus to 8 for rust tests
set vng cpus to 8 for rust tests (for stability in testing),
update relevant doc test.
2024-09-06 10:07:37 -04:00
patso
082bccb557
fix/enable rust tests, make build faster
This commit fixes rust tests and configures ci to
run them on commit. It also sets up CI to run those
in a timely manner by caching dependencies and splitting jobs.
2024-09-06 06:18:11 -04:00
Andrea Righi
b9d4ddebc9 ci: allow parallel builds with meson
Now that we have a sane build environment with meson we don't need to
force sequential builds anymore (--jobs=1) and we can just run regular
builds without having to worry about excessive parallelization.

See commit 43950c6 ("build: Use workspace to group rust sub-projects").

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-09-04 08:57:24 +02:00
Daniel Hodges
a3ae127194 ci: Remove veristat diff workflow
The veristat diff action isn't working properly so remove it for now.
Instead, run veristat (non diff mode) as part of the build-scheds
workflow. This should still provide some useful data for BPF program
related stats.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-29 05:50:06 -07:00
Daniel Hodges
e121dd3dd5 ci: Fix cache directory
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 20:07:50 -07:00
Daniel Hodges
40bb003555 ci: fix merge veristat cache generation
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 18:30:32 -07:00
Daniel Hodges
1ff5e4fbed ci: fix veristat for PRs
Make sure veristat is available for CI for PRs.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 16:42:31 -07:00
Daniel Hodges
7c27f8067d ci: Fix veristat pull request workflow
See the [failure](https://github.com/sched-ext/scx/actions/runs/10389671253), which needs to have an action defined.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-19 12:08:59 -07:00
Daniel Hodges
f68bc82582 meson: Add github action to run veristat
Add a github action to run
[`veristat`](https://github.com/libbpf/veristat) during PRs and merges.
This uses a [action cache](https://github.com/actions/cache) to store
the veristat values when a PR is merged.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-14 07:36:16 -07:00
Andrea Righi
f72e7567bf ci: include kernel config chunk
The kernel from kernel.org lacks the necessary .config chunk required
by the build via virtme-ng.

To fix this, include the kernel .config chunk directly in the scx
repository and use it from here.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 07:10:07 +02:00
Tejun Heo
759be0e406 ci: Use latest upstream kernel and exclude scx_flatcg and scx_pair from testing 2024-07-14 13:27:51 -10:00
Daniel Hodges
8dd8f3f5a6 Add stress-ng to scheduler tests
This change adds stress-ng as a load test for schedulers when running in
CI. It will run stress-ng while schedulers are being tested with a
reasonable amount of work. At the end of the run the stress-ng metrics
are collected for later analysis. However, since these results may be
running in a VM they may not be super robust.
2024-06-10 19:48:42 -07:00
Andrea Righi
0d26219fad ci: enable kvm support in the github workflow
Enable kvm acceleration and qemu microvm to speed up CI tests inside
virtme-ng.

Also adjust the regex to catch potential errors excluding a false
positive triggered by the new configuration.

Link: https://github.blog/changelog/2023-02-23-hardware-accelerated-android-virtualization-on-actions-windows-and-linux-larger-hosted-runners/
Link: https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-05-03 23:20:10 +02:00
David Vernet
e1d6b4d270
ci: Also include gcc-multilib to fix CI
We're failing in CI with:

/usr/include/string.h:26:10: fatal error: 'bits/libc-header-start.h'
file not found
   26 | #include <bits/libc-header-start.h>

This header is apparently shipped with the gcc-multilib Ubuntu package,
according to a Stack Overflow post. Let's see if this fixes CI.

Signed-off-by: David Vernet <void@manifault.com>
2024-04-30 10:11:31 -05:00
Andrea Righi
a20b0669bc ci: mitigate build overload
During the build meson attempts to distribute the workload of multiple
sub-projects across all available CPUs and parallelize each build within
those projects, resulting in an NxN task generation.

This process could potentially overload the CI systems, leading to
potential failures (see for example issue #202).

To mitigate this, always use --jobs=1 during the CI run, which
serializes the build of sub-projects and restricts the level of
parallelization to N.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-04-04 12:29:27 +02:00
Andrea Righi
3166338a74 ci: update list of build dependencies
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-04-04 12:28:57 +02:00
Andrea Righi
39e154f4e9 ci: make sure to use the proper kernel headers
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-04-04 12:28:57 +02:00
Jordan Rome
ffc7b7dc4a Fetch and build bpftool by default
This pairs with the new default behavior to fetch and build libbpf
and is mostly being used so we can use the latest bpftool and libbpf.
2024-03-11 10:00:01 -07:00
Jordan Rome
1769dece7d Remove libbpf as a submodule
Instead clone the libbpf repo at a specific hash during setup.
This is to fix an issue whereby submodules are not included
in the tarball and therefore won't be updated/fetched during
setup after unzipping the tarball.
2024-03-07 18:31:09 -08:00
David Vernet
fc671aca49
ci: Only do CI runs for pull requests
We're basically always runnin two CI jobs: one for a remote push, and
another for when a PR is opened. These are essentially measuring the
same thing, so let's save CI bandwidth and just do a PR run. This will
hopefully make things a bit less noisy as well.

Signed-off-by: David Vernet <void@manifault.com>
2024-02-04 16:12:46 -06:00
Andrea Righi
a8a1944a5d ci: print the latest commit of the checked out sched-ext kernel
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-01-25 21:20:50 +01:00
Andrea Righi
c730e0558f ci: test the shedulers with the latest sched-ext kernel
Instead of downloading a precompiled sched-ext enabled kernel from the
Ubuntu ppa, fetch the latest kernel directly from the sched-ext git
repository and recompile it on-the-fly using virtme-ng.

This allows to get rid of the Ubuntu ppa dependency, take out from the
equation potential Ubuntu-specific patches, and ensures testing all the
schedulers with the most up-to-date sched-ext kernel (that should also
help to detect potential kernel-related issues in advance).

The downside is that the CI runs will take a bit longer now, because we
are recompiling the kernel from scratch. However, the kernel built with
virtme-ng is relatively quick to compile and includes all the sched-ext
features required for testing.

It's worth noting that this method aligns with the current sched-ext
kernel CI, where we test only the in-kernel schedulers (as intended).

This change allows to extend the test coverage, using the same kernel to
test also the schedulers included in this repository.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-01-18 20:51:59 +01:00
Andrea Righi
1c92458c4b ci: temporarily switch to ppa:arighi/sched-ext-unstable
Temporarily switch to the unstable sched-ext ppa, so that we can resume
testing with the new kernel API.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-01-09 22:40:52 +01:00
Andrea Righi
bcbce040b6 scheds: c: improve build portability
Improve build portability by including asm-generic/errno.h, instead of
linux/errno.h.

The difference between these two headers can be summarized as following:

  - asm-generic/errno.h contains generic error code definitions that are
    intended to be common across different architectures,

  - linux/errno.h includes architecture-specific error codes and
    provides additional (or overrides) error code definitions based on
    the specific architecture where the code is compiled.

Considering the architecture-independent nature of scx, the advantages
of being able to use architecture-specific error codes are marginal or
negligible (and we should probably discourage using them).

Moving towards asm-generic/errno.h, however, allows the removal of
cross-compilation dependencies (such as the gcc-multilib package in
Debian/Ubuntu) and improves the code portability across various
architectures and distributions.

This also allows to remove a symlink hack from the github workflow.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-01-02 17:39:46 +01:00
Andrea Righi
05f5c69747 ci: use virtme-ng to test the schedulers
Use virtme-ng to run the schedulers after they're built; virtme-ng
allows to pick an arbitrary sched-ext enabled kernel and run it
virtualizing the entire user-space root filesystem, so we can basically
exceute the recompiled schedulers inside such kernel.

This should allow to catch potential run-time issue in advance (both in
the kernel and the schedulers).

The sched-ext kernel is taken from the Ubuntu ppa (ppa:arighi/sched-ext)
at the moment, since it is the easiest / fastest way to get a
precompiled sched-ext kernel to run inside the Ubuntu 22.04 testing
environment.

The schedulers are tested using the new meson target "test_sched", the
specific actions are defined in meson-scripts/test_sched.

By default each test has a timeout of 30 sec, after the virtme-ng
completes the boot (that should be enough to initialize the scheduler
and run the scheduler for some seconds), while the total lifetime of the
virtme-ng guest is set to 60 sec, after this time the guest will be
killed (this allows to catch potential kernel crashes / hangs).

If a single scheduler fails the test, the entire "test_sched" action
will be interrupted and the overall test result will be considered a
failure.

At the moment scx_layered is excluded from the tests, because it
requires a special configuration (we should probably pre-generate a
default config in the workflow actions and change the scheduler to use
the default config if it's executed without any argument).

Moreover, scx_flatcg is also temporarily excluded from the tests,
because of these known issues:
 - https://github.com/sched-ext/scx/issues/49
 - https://github.com/sched-ext/sched_ext/pull/101

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2023-12-29 15:54:10 +01:00
David Vernet
1bf04d0972
ci: Run CI job on Ubuntu 22.04
Andrea pointed out that we can and should be using Ubuntu 22.04.
Unfortunately it still doesn't ship some of the deps we need like
clang-17, but it does at least ship virtme-ng, so it's good for us to
use this so that we can actually test running the schedulers in a
virtme-ng VM when it supports being run in docker.

Also, update the job to run on pushes, and not just when a PR is opened

Suggested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David Vernet <void@manifault.com>
2023-12-20 15:29:49 -06:00
David Vernet
4523b10e45
scx: Add CI action that builds schedulers for PRs
When Ubuntu ships with sched_ext, we can also maybe test loading the
schedulers (not sure if the runners can run as root though). For now, we
should at least have a CI job that lets us verify that the schedulers
can _build_. To that end, this patch adds a basic CI action that builds
the schedulers.

As is, this is a bit brittle in that we're having to manually download
and install a few dependencies. I don't see a better way for now without
hosting our own runners with our own containers, but that's a bigger
investment. For now, hopefully this will get us _some_ coverage.

Signed-off-by: David Vernet <void@manifault.com>
2023-12-18 21:12:50 -06:00