JakeHillion/scx

mirror of https://github.com/JakeHillion/scx.git synced 2024-12-04 06:47:11 +00:00

Author	SHA1	Message	Date
Andrea Righi	addae505df	ci: enable lockdep in the virtme-ng kernel config Set CONFIG_PROVE_LOCKING=y in the defalut virtme-ng config, to enable lockdep and additional lock debugging features, which can help catch lock-related kernel issues in advance. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-26 13:57:37 +02:00
Pat Somaru	302b3ac6c0	add retries to kernel clone step add retries to kernel clone step odds are that like, when we most care about cloning a fresh upstream (i.e. some fix was just pushed), things are going to be flakiest (i.e. no caches anywhere), so retry 10 times ~3 minutes total between tries when trying to clone kernel source.	2024-10-23 19:58:21 -04:00
Andrea Righi	d33e1b02b4	ci: enable SCHED_MC in the virtme-ng kernel config Enabling SCHED_MC in the kernel used for testing allows us to potentially run more complext tests, simulating different CPU topologies and have access to such topology data through the in-kernel scheduler's information. This can be useful as we add more topology awareness logic to the sched_ext core (e.g., in the built-in idle CPU selection policy). Therefore add this option to the default .config (and also fix a missing newline at the end of the file). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-23 11:18:02 +02:00
Pat Somaru	d944a39a7f	remove apt fast from ci setup remove apt fast from ci setup to reduce non-core dependencies	2024-10-17 13:08:03 -04:00
Daniel Hodges	71d63010af	scx_layered: Refactor layer iteration Remove DSQ iter algos. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 13:13:53 -07:00
Jake Hillion	52c279a469	layered: make default value for disable_topology dynamic Disable topology currently defaults to `false` (topology enabled...). Change this so that topology is enabled by default on hardware that may benefit from it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit from it. This is a slightly noisy change as we have to move ownership of the newly mutable layer specs into the `Scheduler` object (previously they were a borrow). We don't have a `Topology` object to make the default decision from until `Scheduler::init`, and I think this is because of the possibility of hot plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is potentially mutable. Test plan: - CI. Updated to be explicit about topology in both cases. Single NUMA multi-LLC machine: ``` $ scx_layered --run-example ... 13:34:01 [INFO] Topology awareness not specified, selecting enabled based on hardware ... $ scx_layered --run-example --disable-topology=true ... 13:33:41 [INFO] Disabling topology awareness ... $ scx_layered --run-example -t ... 13:33:15 [INFO] Disabling topology awareness ... $ scx_layered --run-example --disable-topology=false # none of the above messages present ``` Single NUMA single LLC machine: ``` $ scx_layered --run-example 15:33:10 [INFO] Topology awareness not specified, selecting disabled based on hardware ```	2024-10-11 17:09:07 +01:00
Pat Somaru	5e4a7ac655	setup matrix job to run key paths of layered through verifier/stress test	2024-10-09 21:09:41 -04:00
Pat Somaru	9d463a85e3	have bpf builds use separate cache have bpf kernel builds use separate cache (to avoid cache mixups when cherry pick)	2024-10-03 08:20:53 -04:00
Pat Somaru	a715d20d92	update urls to point to bpf-next	2024-10-02 22:20:56 -04:00
Pat Somaru	5182682baf	remove lint step from bpf workflow	2024-10-02 21:27:45 -04:00
Pat Somaru	7e95e58e20	enable periodic build/test against bpf-next add a ci job to build and test against bpf-next every 6 hours.	2024-10-02 21:20:14 -04:00
Pat Somaru	49a1e8fb2a	fix artifact names to work with mergequeue naming	2024-09-26 14:33:38 -04:00
Jake Hillion	b89f3fe70b	ci: enable on merge_group action (#698 ) Enabling the `caching-build` workflow on `merge_group` actions. This is required to enable merge queues which should make merging lots of PRs while getting full CI testing possible. Currently `lint` is the only required check but we will add more in future.	2024-09-26 17:40:47 +01:00
Pat Somaru	e6e91f8755	do not cache fast jobs dependencies	2024-09-26 11:27:08 -04:00
likewhatevs	bf68679d35	Setup "debugging" and misc cleanup (#695 ) * Fix a couple of misc errors in build scripts. * Tweak scripts/kconfigs to make bpftrace work. * Update how CI caching works to make builds faster (6 minute turnaround time) * Update CI config to generate per-scheduler debug archives w/ guest dmesg/scheduler stdout, guest stdout, bpftrace script output, veristat output. * Update build scripts to accept the following: ** VNG RW -- write to host filesystem (better caching, logging). * For stress tests in particular (via ini config): QEMU Opts -- to facilitate reproducing bugs (i.e. high core count). bpftrace scripts -- specify bpftrace scripts to run during stress tests.	2024-09-26 11:11:10 -04:00
likewhatevs	d8246fdcd6	clean up ci/make ci nicer (#682 ) * make ci nicer Replace build scheds and merged with caching build, and rename caching build to build-and-test. This should make the CI reports on PRs be nice and specific (i.e. at a glance, know what passes and what fails). It also keeps PR CI jobs up to date (as folks edit things) and has them all use one config/24.04 etc. * prevent untar permission errors from causing cache misses	2024-09-24 16:36:56 -04:00
Pat Somaru	878eaf042e	add 'continue on error' to ci jobs I think I see PRs being harder to write because all parts of a CI job are cancelled when one fails. I think I am also starting to see that we have enough largely disjoint moving pieces that there will often be one that is failing stress tests at any time. Make CI run all stress tests always to address this.	2024-09-24 09:17:55 -04:00
Jake Hillion	8ca45cfa37	lint: enable cargo fmt (#643 ) Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it. Test plan: - CI lint check passes.	2024-09-11 10:03:20 +01:00
patso	28319f3205	migrate ci vm to ubuntu 24.04 migrate ci vm to ubuntu 24.04	2024-09-10 09:53:40 -04:00
patso	2e46f71780	generate docs for scx and kernel generate docs for scx and kernel and push to gh page this "adds" kernel scheduler and bpf docs to the generated scx rust docs.	2024-09-09 15:37:59 -04:00
patso	120211d731	split build and test jobs split build and test jobs to reduce ci turnaround time and make it clear what is failing when something fails. also add virtiofsd to deps to make test compilation faster (most test time is compliation) and remove all force 9ps.	2024-09-08 02:54:24 -04:00
Andrea Righi	8ff57ac915	ci: fix vng command to set the right amount of CPUs The right options to set the amount of CPUs in vng is `-p`, not `-c`. Change the command to use the long format `--cpu` and `--memory` for better clarity. This fixes issue #625. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-07 09:47:13 +02:00
patso	eaa636dfcc	set vng cpus to 8 for rust tests set vng cpus to 8 for rust tests (for stability in testing), update relevant doc test.	2024-09-06 10:07:37 -04:00
patso	082bccb557	fix/enable rust tests, make build faster This commit fixes rust tests and configures ci to run them on commit. It also sets up CI to run those in a timely manner by caching dependencies and splitting jobs.	2024-09-06 06:18:11 -04:00
Andrea Righi	b9d4ddebc9	ci: allow parallel builds with meson Now that we have a sane build environment with meson we don't need to force sequential builds anymore (--jobs=1) and we can just run regular builds without having to worry about excessive parallelization. See commit `43950c6` ("build: Use workspace to group rust sub-projects"). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-04 08:57:24 +02:00
Daniel Hodges	a3ae127194	ci: Remove veristat diff workflow The veristat diff action isn't working properly so remove it for now. Instead, run veristat (non diff mode) as part of the build-scheds workflow. This should still provide some useful data for BPF program related stats. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-29 05:50:06 -07:00
Daniel Hodges	e121dd3dd5	ci: Fix cache directory Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-19 20:07:50 -07:00
Daniel Hodges	40bb003555	ci: fix merge veristat cache generation Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-19 18:30:32 -07:00
Daniel Hodges	1ff5e4fbed	ci: fix veristat for PRs Make sure veristat is available for CI for PRs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-19 16:42:31 -07:00
Daniel Hodges	7c27f8067d	ci: Fix veristat pull request workflow See the [failure](https://github.com/sched-ext/scx/actions/runs/10389671253), which needs to have an action defined. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-19 12:08:59 -07:00
Daniel Hodges	f68bc82582	meson: Add github action to run veristat Add a github action to run [`veristat`](https://github.com/libbpf/veristat) during PRs and merges. This uses a [action cache](https://github.com/actions/cache) to store the veristat values when a PR is merged. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-14 07:36:16 -07:00
Andrea Righi	f72e7567bf	ci: include kernel config chunk The kernel from kernel.org lacks the necessary .config chunk required by the build via virtme-ng. To fix this, include the kernel .config chunk directly in the scx repository and use it from here. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 07:10:07 +02:00
Tejun Heo	759be0e406	ci: Use latest upstream kernel and exclude scx_flatcg and scx_pair from testing	2024-07-14 13:27:51 -10:00
Daniel Hodges	8dd8f3f5a6	Add stress-ng to scheduler tests This change adds stress-ng as a load test for schedulers when running in CI. It will run stress-ng while schedulers are being tested with a reasonable amount of work. At the end of the run the stress-ng metrics are collected for later analysis. However, since these results may be running in a VM they may not be super robust.	2024-06-10 19:48:42 -07:00
Andrea Righi	0d26219fad	ci: enable kvm support in the github workflow Enable kvm acceleration and qemu microvm to speed up CI tests inside virtme-ng. Also adjust the regex to catch potential errors excluding a false positive triggered by the new configuration. Link: https://github.blog/changelog/2023-02-23-hardware-accelerated-android-virtualization-on-actions-windows-and-linux-larger-hosted-runners/ Link: https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/ Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-03 23:20:10 +02:00
David Vernet	e1d6b4d270	ci: Also include gcc-multilib to fix CI We're failing in CI with: /usr/include/string.h:26:10: fatal error: 'bits/libc-header-start.h' file not found 26 \| #include <bits/libc-header-start.h> This header is apparently shipped with the gcc-multilib Ubuntu package, according to a Stack Overflow post. Let's see if this fixes CI. Signed-off-by: David Vernet <void@manifault.com>	2024-04-30 10:11:31 -05:00
Andrea Righi	a20b0669bc	ci: mitigate build overload During the build meson attempts to distribute the workload of multiple sub-projects across all available CPUs and parallelize each build within those projects, resulting in an NxN task generation. This process could potentially overload the CI systems, leading to potential failures (see for example issue #202). To mitigate this, always use --jobs=1 during the CI run, which serializes the build of sub-projects and restricts the level of parallelization to N. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 12:29:27 +02:00
Andrea Righi	3166338a74	ci: update list of build dependencies Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 12:28:57 +02:00
Andrea Righi	39e154f4e9	ci: make sure to use the proper kernel headers Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 12:28:57 +02:00
Jordan Rome	ffc7b7dc4a	Fetch and build bpftool by default This pairs with the new default behavior to fetch and build libbpf and is mostly being used so we can use the latest bpftool and libbpf.	2024-03-11 10:00:01 -07:00
Jordan Rome	1769dece7d	Remove libbpf as a submodule Instead clone the libbpf repo at a specific hash during setup. This is to fix an issue whereby submodules are not included in the tarball and therefore won't be updated/fetched during setup after unzipping the tarball.	2024-03-07 18:31:09 -08:00
David Vernet	fc671aca49	ci: Only do CI runs for pull requests We're basically always runnin two CI jobs: one for a remote push, and another for when a PR is opened. These are essentially measuring the same thing, so let's save CI bandwidth and just do a PR run. This will hopefully make things a bit less noisy as well. Signed-off-by: David Vernet <void@manifault.com>	2024-02-04 16:12:46 -06:00
Andrea Righi	a8a1944a5d	ci: print the latest commit of the checked out sched-ext kernel Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-01-25 21:20:50 +01:00
Andrea Righi	c730e0558f	ci: test the shedulers with the latest sched-ext kernel Instead of downloading a precompiled sched-ext enabled kernel from the Ubuntu ppa, fetch the latest kernel directly from the sched-ext git repository and recompile it on-the-fly using virtme-ng. This allows to get rid of the Ubuntu ppa dependency, take out from the equation potential Ubuntu-specific patches, and ensures testing all the schedulers with the most up-to-date sched-ext kernel (that should also help to detect potential kernel-related issues in advance). The downside is that the CI runs will take a bit longer now, because we are recompiling the kernel from scratch. However, the kernel built with virtme-ng is relatively quick to compile and includes all the sched-ext features required for testing. It's worth noting that this method aligns with the current sched-ext kernel CI, where we test only the in-kernel schedulers (as intended). This change allows to extend the test coverage, using the same kernel to test also the schedulers included in this repository. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-01-18 20:51:59 +01:00
Andrea Righi	1c92458c4b	ci: temporarily switch to ppa:arighi/sched-ext-unstable Temporarily switch to the unstable sched-ext ppa, so that we can resume testing with the new kernel API. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-01-09 22:40:52 +01:00
Andrea Righi	bcbce040b6	scheds: c: improve build portability Improve build portability by including asm-generic/errno.h, instead of linux/errno.h. The difference between these two headers can be summarized as following: - asm-generic/errno.h contains generic error code definitions that are intended to be common across different architectures, - linux/errno.h includes architecture-specific error codes and provides additional (or overrides) error code definitions based on the specific architecture where the code is compiled. Considering the architecture-independent nature of scx, the advantages of being able to use architecture-specific error codes are marginal or negligible (and we should probably discourage using them). Moving towards asm-generic/errno.h, however, allows the removal of cross-compilation dependencies (such as the gcc-multilib package in Debian/Ubuntu) and improves the code portability across various architectures and distributions. This also allows to remove a symlink hack from the github workflow. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-01-02 17:39:46 +01:00
Andrea Righi	05f5c69747	ci: use virtme-ng to test the schedulers Use virtme-ng to run the schedulers after they're built; virtme-ng allows to pick an arbitrary sched-ext enabled kernel and run it virtualizing the entire user-space root filesystem, so we can basically exceute the recompiled schedulers inside such kernel. This should allow to catch potential run-time issue in advance (both in the kernel and the schedulers). The sched-ext kernel is taken from the Ubuntu ppa (ppa:arighi/sched-ext) at the moment, since it is the easiest / fastest way to get a precompiled sched-ext kernel to run inside the Ubuntu 22.04 testing environment. The schedulers are tested using the new meson target "test_sched", the specific actions are defined in meson-scripts/test_sched. By default each test has a timeout of 30 sec, after the virtme-ng completes the boot (that should be enough to initialize the scheduler and run the scheduler for some seconds), while the total lifetime of the virtme-ng guest is set to 60 sec, after this time the guest will be killed (this allows to catch potential kernel crashes / hangs). If a single scheduler fails the test, the entire "test_sched" action will be interrupted and the overall test result will be considered a failure. At the moment scx_layered is excluded from the tests, because it requires a special configuration (we should probably pre-generate a default config in the workflow actions and change the scheduler to use the default config if it's executed without any argument). Moreover, scx_flatcg is also temporarily excluded from the tests, because of these known issues: - https://github.com/sched-ext/scx/issues/49 - https://github.com/sched-ext/sched_ext/pull/101 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-29 15:54:10 +01:00
David Vernet	1bf04d0972	ci: Run CI job on Ubuntu 22.04 Andrea pointed out that we can and should be using Ubuntu 22.04. Unfortunately it still doesn't ship some of the deps we need like clang-17, but it does at least ship virtme-ng, so it's good for us to use this so that we can actually test running the schedulers in a virtme-ng VM when it supports being run in docker. Also, update the job to run on pushes, and not just when a PR is opened Suggested-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David Vernet <void@manifault.com>	2023-12-20 15:29:49 -06:00
David Vernet	4523b10e45	scx: Add CI action that builds schedulers for PRs When Ubuntu ships with sched_ext, we can also maybe test loading the schedulers (not sure if the runners can run as root though). For now, we should at least have a CI job that lets us verify that the schedulers can _build_. To that end, this patch adds a basic CI action that builds the schedulers. As is, this is a bit brittle in that we're having to manually download and install a few dependencies. I don't see a better way for now without hosting our own runners with our own containers, but that's a bigger investment. For now, hopefully this will get us _some_ coverage. Signed-off-by: David Vernet <void@manifault.com>	2023-12-18 21:12:50 -06:00

49 Commits