scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-25 20:20:23 +00:00

Author	SHA1	Message	Date
Andrea Righi	48bba8e4f6	scx_userland: survive to dispatch failures If the scheduler fails to dispatch a task we immediately give up, exiting with an error like the following: Failed to dispatch task 251 in 1 EXIT: BPF scheduler unregistered This scenario can be simulated decreasing dramatically the value of MAX_ENQUEUED_TASKS. We can make the scheduler a little more robust simply by re-adding the task that cannot be dispatched to vruntime_head and stop dispatching additional tasks in the same batch. This can give enough room, under such "dispatch overload" condition, to catch up and resume the normal execution without crashing. Moreover, introduce nr_vruntime_failed to report failed dispatch events in the scheduler's statistics. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-13 22:19:36 +01:00
David Vernet	a68885f92f	Merge pull request #29 from arighi/scx-userland-pid-max scx_userland: allocate tasks array based on kernel.pid_max	2023-12-13 10:57:03 -06:00
Andrea Righi	1e9e6778bc	scx_userland: allocate tasks array based on kernel.pid_max Currently the array of enqueued tasks is statically allocated to a fixed size of USERLAND_MAX_TASKS to avoid potential deadlocks that could be introduced by performing dynamic allocations in the enqueue path. However, this also adds a limit on the maximum pid that the scheduler can handle, since the pid is used as the index to access the array. In fact, it is quite easy to trigger the following failure on an average desktop system (making this scheduler pretty much unusable in such scenario): $ sudo scx_userland ... Failed to enqueue task 33258: No such file or directory EXIT: BPF scheduler unregistered Prevent this by using sysctl's kernel.pid_max as the size of the tasks array (and still allocate it all at once during initialization). The downside of this change is that scx_userland may require additional memory to start and in small systems it could even trigger OOMs. For this reason add an explicit message to the command help, suggesting to reduce kernel.pid_max in case of OOM conditions. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-13 17:33:10 +01:00
Tejun Heo	ca05e28dde	Merge pull request #28 from sched-ext/htejun Bump versions and add LICENSE symlinks for scx_layered and scx_rusty	2023-12-12 11:22:13 -10:00
Tejun Heo	8a07bcc31b	Bump versions and add LICENSE symlinks for scx_layered and scx_rusty	2023-12-12 11:21:08 -10:00
Tejun Heo	35fdfdffb0	Merge pull request #27 from davide125/license rust: clarify license and include text	2023-12-12 11:14:50 -10:00
Davide Cavalca	21e468a491	rust: clarify license and include text	2023-12-12 13:02:13 -08:00
Tejun Heo	fbb0164454	Merge pull request #26 from kkdwivedi/central-fix-nr-slots scx_central: Break dispatch_to_cpu loop when running out of buffer slots	2023-12-11 21:57:30 -10:00
Kumar Kartikeya Dwivedi	c4c994c9ce	scx_central: Break dispatch_to_cpu loop when running out of buffer slots For the case where many tasks being popped from the central queue cannot be dispatched to the local DSQ of the target CPU, we will keep bouncing them to the fallback DSQ and continue the dispatch_to_cpu loop until we find one which can be dispatch to the local DSQ of the target CPU. In a contrived case, it might be so that all tasks pin themselves to CPUs != target CPU, and due to their affinity cannot be dispatched to that CPU's local DSQ. If all of them are filling up the central queue, then we will keep looping in the dispatch_to_cpu loop and eventually run out of slots for the dispatch buffer. The nr_mismatched counter will quickly rise and sched-ext will notice the error and unload the BPF scheduler. To remedy this, ensure that we break the dispatch_to_cpu loop when we can no longer perform a dispatch operation. The outer loop in central_dispatch for the central CPU should ensure the loop breaks when we run out of these slots and schedule a self-IPI to the central core, and allow sched-ext to consume the dispatch buffer before restarting the dispatch loop again. A basic way to reproduce this scenario is to do: taskset -c 0 perf bench sched messaging The error in the kernel will be: sched_ext: BPF scheduler "central" errored, disabling sched_ext: runtime error (dispatch buffer overflow) bpf_prog_6a473147db3cec67_dispatch_to_cpu+0xc2/0x19a bpf_prog_c9e51ba75372a829_central_dispatch+0x103/0x1a5 Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>	2023-12-12 07:50:46 +00:00
Tejun Heo	737aa810b1	Merge pull request #25 from jordalgo/readme-ubuntu Update README for ubuntu instructions	2023-12-11 16:05:25 -10:00
Jordan Rome	df48390d73	Update README for ubuntu instructions	2023-12-11 17:42:08 -08:00
Tejun Heo	872f1d1f1e	Merge pull request #24 from sched-ext/htejun README: Add ubuntu instructions	2023-12-11 11:04:59 -10:00
Tejun Heo	7ebb102d5a	README: David's review	2023-12-11 11:04:40 -10:00
Tejun Heo	feb9018cbe	README: Add ubuntu instructions	2023-12-11 10:24:10 -10:00
Tejun Heo	8ea5850967	Merge pull request #23 from arighi/fix-s390-build build: add Debian arch name mapping for s390	2023-12-10 03:08:56 -10:00
Andrea Righi	1742daa4b5	build: add Debian arch name mapping for s390 Add the proper mapping from Debian architecture "s390x" to the kernel architecture name "s390". Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-10 12:42:15 +01:00
Tejun Heo	c9d0cc640a	Merge pull request #22 from arighi/enable-rust-build-option build: introduce enable_rust build option	2023-12-09 15:59:19 -10:00
Tejun Heo	abbb6a0276	Merge pull request #20 from arighi/scx-rusty-fix scx_rusty: fix "subtract with overflow" error	2023-12-09 15:58:11 -10:00
David Vernet	ab1d894fd2	Merge pull request #21 from arighi/misc-fixes	2023-12-09 09:33:09 -06:00
Andrea Righi	6343bcf360	build: introduce enable_rust build option Introduce an option to enable/disable the build of all the Rust sub-projects. This can be useful to build scx on those systems where Rust is not fully supported (e.g., armhf). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 15:05:23 +01:00
Andrea Righi	0637b6a0b5	scx_nest: use proper format string for u64 types This prevents some warnings when building scx_nest on 32-bit architectures. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:50 +01:00
Andrea Righi	adc01140aa	scx_qmap: use proper format string for u64 types This prevents some warnings when building scx_qmap on 32-bit architectures. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:44 +01:00
Andrea Righi	4df979ccb7	scx_pair: use proper format string for u64 types This prevents some warnings when building scx_pair on 32-bit architectures. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:38 +01:00
Andrea Righi	14e70fd134	scx_flatcg: use proper data size for hweight_gen We should explicitly use u64 for hweight_gen to prevent the following build failures on 32-bit architectures: scheds/kernel-examples/scx_flatcg.p/scx_flatcg.bpf.skel.h: In function ‘scx_flatcg__assert’: scheds/kernel-examples/scx_flatcg.p/scx_flatcg.bpf.skel.h:3523:9: error: static assertion failed: "unexpected size of \'hweight_gen\'" 3523 \| _Static_assert(sizeof(s->data->hweight_gen) == 8, "unexpected size of 'hweight_gen'"); Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:30 +01:00
Andrea Righi	00c5d2dfb7	scx_qmap: use proper data size for scheduler stats We should explicitly use u64 for scheduler statistics to prevent the following build failures on 32-bit architectures: scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h: In function ‘scx_qmap__assert’: scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2560:9: error: static assertion failed: "unexpected size of \'nr_enqueued\'" 2560 \| _Static_assert(sizeof(s->bss->nr_enqueued) == 8, "unexpected size of 'nr_enqueued'"); \| ^~~~~~~~~~~~~~ scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2561:9: error: static assertion failed: "unexpected size of \'nr_dispatched\'" 2561 \| _Static_assert(sizeof(s->bss->nr_dispatched) == 8, "unexpected size of 'nr_dispatched'"); \| ^~~~~~~~~~~~~~ scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2562:9: error: static assertion failed: "unexpected size of \'nr_reenqueued\'" 2562 \| _Static_assert(sizeof(s->bss->nr_reenqueued) == 8, "unexpected size of 'nr_reenqueued'"); \| ^~~~~~~~~~~~~~ scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2563:9: error: static assertion failed: "unexpected size of \'nr_dequeued\'" 2563 \| _Static_assert(sizeof(s->bss->nr_dequeued) == 8, "unexpected size of 'nr_dequeued'"); \| ^~~~~~~~~~~~~~ scheds/kernel-examples/scx_qmap.p/scx_qmap.bpf.skel.h:2564:9: error: static assertion failed: "unexpected size of \'nr_core_sched_execed\'" 2564 \| _Static_assert(sizeof(s->bss->nr_core_sched_execed) == 8, "unexpected size of 'nr_core_sched_execed'"); \| ^~~~~~~~~~~~~~ Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:25 +01:00
Andrea Righi	4c65e71c48	scx_central: use proper format string for u64 When printing scheduler statistics we use %lu to print u64 values, that works well on 64-bit architectures, but on 32-bit architectures we get errors like the following: 106 \| printf("total :%10lu local:%10lu queued:%10lu lost:%10lu\n", \| ~~~~^ \| \| \| long unsigned int \| %10llu 107 \| skel->bss->nr_total, \| ~~~~~~~~~~~~~~~~~~~ \| \| \| u64 {aka long long unsigned int} Fix this by using the proper format %llu. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:20 +01:00
Andrea Righi	e396f1e467	scx_userland: get rid of strings.h include Use compiler's built-in stack initialization instead of memset(). In this way we can get rid of the string.h include and make cross-compilation easier in certain small environments (i.e., arm). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:49:14 +01:00
Andrea Righi	c5d1bc3577	scx_rusty: fix "subtract with overflow" error It seems that under certain conditions, the difference between the current and the previous procfs::CpuStat values may become negative, triggering the following crash/trace: thread 'main' panicked at /build/rustc-VvCkKl/rustc-1.73.0+dfsg0ubuntu1/library/core/src/ops/arith.rs:217:1: attempt to subtract with overflow stack backtrace: ... 19: 0x590d8481909e - scx_rusty::calc_util::h46f2af9c512c2ecd at /home/arighi/src/scx/scheds/rust-user/scx_rusty/src/main.rs:217:31 20: 0x590d8481c794 - scx_rusty::Tuner::step::h2e51076f043a8593 at /home/arighi/src/scx/scheds/rust-user/scx_rusty/src/main.rs:444:38 21: 0x590d84828270 - scx_rusty::Scheduler::run::hb5483f1e585f52fe at /home/arighi/src/scx/scheds/rust-user/scx_rusty/src/main.rs:1198:17 22: 0x590d848289e9 - scx_rusty::main::h9ba8c62ad33aeee1 ... Prevent this by introducing a sub_or_zero() helper function that returns zero if the difference is negative. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-09 14:47:35 +01:00
Tejun Heo	330e104eae	Merge pull request #18 from sched-ext/num_schedulings scx_nest: Reset schedulings when a task is dispatched	2023-12-08 09:18:59 -10:00
David Vernet	c953ee47a6	scx_nest: Reset schedulings when a task is dispatched In scx_nest, we currently count the number of times that a core is scheduled for compaction before we eventually just eagerly compact the core. The idea is that the core could thrash between being scheduled and then "de-scheduled" for compaction if there are a couple of tasks that are bouncing between cores in the primary nest often enough to kick them out of being compacted. We're currently resetting schedulings when a core is eagerly compacted, but to be precise we should probably also reset the count when a core consumes a task from the fallback DSQ, at this indicates that the system is overcommitted and that we likely won't benefit from compacting the primary nest. Signed-off-by: David Vernet <void@manifault.com>	2023-12-08 13:16:40 -06:00
Tejun Heo	0b3ce4dfef	Merge pull request #17 from sched-ext/htejun Support offline compilation	2023-12-08 08:46:04 -10:00
Tejun Heo	9e12238d64	Support offline compilation	2023-12-08 08:45:44 -10:00
Tejun Heo	82c5c09514	Merge pull request #16 from jordalgo/readme-meson-min Update readme with alt meson install instructions	2023-12-08 06:40:16 -10:00
Jordan Rome	cad301238a	Update readme with alt meson install instructions	2023-12-08 04:50:42 -08:00
Tejun Heo	d131502582	Merge pull request #15 from sched-ext/htejun Bump overall version to 0.1.1	2023-12-07 13:12:35 -10:00
Tejun Heo	d8c2321831	Bump overall version to 0.1.1 - scx_nest included. - scx_rusty bug fix. - Ubuntu build fixes.	2023-12-07 13:11:28 -10:00
Tejun Heo	bdf722893b	Merge pull request #14 from sched-ext/htejun scx_utils: Bump version to 0.3.2	2023-12-07 13:10:56 -10:00
Tejun Heo	3acdef8f2b	scx_utils: Bump version to 0.3.2 - Build fix for ubuntu clang.	2023-12-07 13:08:21 -10:00
Tejun Heo	b0bf1639fa	Merge pull request #13 from arighi/fix-clang-version scx_utils::BpfBuilder: properly detect clang version in Ubuntu	2023-12-07 12:28:46 -10:00
Andrea Righi	7ceccf6516	scx_utils::BpfBuilder: properly detect clang version in Ubuntu Apply the same logic of commit `00cd15a` ("build: properly detect clang version in Ubuntu") in scx_utils as well. This allows to build scx_utils properly in Ubuntu. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-07 23:26:21 +01:00
Tejun Heo	11c6a809b2	Merge pull request #12 from sched-ext/scx_nest scx_nest: Add scx_nest scheduler	2023-12-07 12:19:40 -10:00
Tejun Heo	cd6d45d0e1	Merge pull request #9 from sched-ext/rusty rusty: Fix calc_util() in rusty	2023-12-07 12:17:27 -10:00
David Vernet	ca21842908	scx_nest: Add scx_nest scheduler The scx_nest scheduler seems to be behaving well. Let's merge it to the scx repo so that CachyOS can package and use it more easily. Signed-off-by: David Vernet <void@manifault.com>	2023-12-07 13:28:09 -06:00
David Vernet	5751f1c2a3	Merge pull request #11 from arighi/fix-clang-version-regexp build: properly detect clang version in Ubuntu	2023-12-07 12:32:38 -06:00
Andrea Righi	00cd15a3ae	build: properly detect clang version in Ubuntu Some distro may add their own prefix to the version string of clang, for example in Ubuntu: $ clang --version Ubuntu clang version 17.0.5 (1ubuntu1) ... That triggers the following meson error during the setup phase: meson.build:25:44: ERROR: String '' cannot be converted to int Change the regexp used to evaluate the clang version to avoid this build failure. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2023-12-07 19:24:12 +01:00
David Vernet	b53e8251a1	rusty: Fix calc_util() in rusty We were assigning curr to prev stats, and vice versa, in calc_util(). This was causing the following crash on debug builds: [void@maniforge scheds]$ sudo RUST_BACKTRACE=1 scx_rusty 00:00:56 [INFO] CPUs: online/possible = 32/32 00:00:56 [INFO] DOM[00] cpumask 0000000000FF00FF (16 cpus) 00:00:56 [INFO] DOM[01] cpumask 00000000FF00FF00 (16 cpus) 00:00:56 [INFO] Rusty Scheduler Attached thread 'main' panicked at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/ops/arith.rs:217:1: attempt to subtract with overflow stack backtrace: 0: rust_begin_unwind at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/std/src/panicking.rs:597:5 1: core::panicking::panic_fmt at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/panicking.rs:72:14 2: core::panicking::panic at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/panicking.rs:127:5 3: <u64 as core::ops::arith::Sub>::sub at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/ops/arith.rs:217:1 4: <&u64 as core::ops::arith::Sub<&u64>>::sub at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/internal_macros.rs:55:17 5: scx_rusty::calc_util at ./rust-user/scx_rusty/src/main.rs:216:29 6: scx_rusty::Tuner::step at ./rust-user/scx_rusty/src/main.rs:444:38 7: scx_rusty::Scheduler::run at ./rust-user/scx_rusty/src/main.rs:1198:17 8: scx_rusty::main at ./rust-user/scx_rusty/src/main.rs:1261:5 9: core::ops::function::FnOnce::call_once at /rustc/475c71da0710fd1d40c046f9cee04b733b5b2b51/library/core/src/ops/function.rs:250:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. Flip them to avoid the crash. Rusty now runs fine. Signed-off-by: David Vernet <void@manifault.com>	2023-12-06 18:25:27 -06:00
David Vernet	e38937b501	Merge pull request #8 from sched-ext/readme Readme	2023-12-06 17:21:20 -06:00
David Vernet	d9ece9fe87	README: Add link to join scx slack channel Add a link to join the scx slack channel. Signed-off-by: David Vernet <void@manifault.com>	2023-12-06 16:55:04 -06:00
David Vernet	eba9155a7f	README: Add scheds/ README's There's a fairly comprehensive README in the kernel's tools/sched_ext directory which describes each of the example schedulers. Let's pull it into this repository, and split it into the various subdirectories containing the kernele-examples/ schedulers, and the rust-user/ schedulers. Signed-off-by: David Vernet <void@manifault.com>	2023-12-06 16:55:02 -06:00
David Vernet	086c05eaf5	README: Update a bunch of formatting and grammar The root README files have some grammatical mistakes, and/or need to be updated to not be in the context of being sent as a patch set. Update them. Signed-off-by: David Vernet <void@manifault.com>	2023-12-06 16:53:21 -06:00

... 2 3 4 5 6

299 Commits