scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-12-13 03:57:19 +00:00

Author	SHA1	Message	Date
Andrea Righi	be681c731a	scx_rustland_core: pass nvcsw, slice and dsq_vtime to user-space Provide additional task metrics to user-space schedulers via QueuedTask: - nvcsw: total amount of voluntary context switches - slice: task time slice "budget" (from p->scx.slice) - dsq_vtime: current task vtime (from p->scx.dsq_vtime) In this way user-space schedulers can quickly access these metrics to implement better scheduling policy. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-15 23:11:43 +02:00
Andrea Righi	1bbae64dc7	scx_rustland_core: update CPU idle selection logic Re-align idle selection logic with some of the latest improvements done in scx_bpfland. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-15 23:11:42 +02:00
Tejun Heo	4841df8138	Merge pull request #793 from minosfuture/vmlinux_per_arch Use per-arch vmlinux.h	2024-10-15 19:52:42 +00:00
Dan Schatzberg	902f41adf0	Merge pull request #799 from dschatzberg/mitosis_dispatch_no_wakeup scx_mitosis: handle enqueue() on !wakeup	2024-10-15 13:46:07 +00:00
Daniel Hodges	e017692697	Merge pull request #801 from hodgesds/layered-iter-fix scx_layered: Remove layer iteration	2024-10-14 23:39:48 +00:00
Daniel Hodges	71d63010af	scx_layered: Refactor layer iteration Remove DSQ iter algos. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 13:13:53 -07:00
Dan Schatzberg	a17f16e4b9	scx_mitosis: handle enqueue() on !wakeup If we're not on the wakeup path, we may see enqueue() invoked without select_cpu() which will require an idle cpu lookup. In order to fix this, we refactor the idle_cpu lookup in select_cpu so it can be invoked from enqueue(). Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-10-14 10:13:07 -07:00
Daniel Hodges	7bfbc71012	Merge pull request #798 from sched-ext/hodgesds-perfetto-docs Update developer guide with Perfetto info	2024-10-14 15:09:15 +00:00
Daniel Hodges	43615107f9	Merge pull request #797 from hodgesds/layered-llc-integration scx_layered: Add LLC integration test	2024-10-14 15:06:25 +00:00
Daniel Hodges	c76b87f62f	Update developer guide with Perfetto info Update the developer guide with info on how to run the helper script to generate Perfetto compatible traces.	2024-10-14 10:51:03 -04:00
Daniel Hodges	912d6e01c1	scx_layered: Add LLC integration test Add an integration test for testing that the `llcs` field on the layer config works properly. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-14 07:27:29 -07:00
Daniel Hodges	ed18e43612	Merge pull request #795 from hodgesds/bpftrace-tests scx_layered: Add topology integration test	2024-10-14 12:54:54 +00:00
Andrea Righi	5408cd3f78	Merge pull request #796 from sched-ext/sched-ftrace-script scripts: Convert sched ftrace helper scripts to python	2024-10-14 12:39:57 +00:00
Andrea Righi	8b7f9cde0f	scripts: Convert sched ftrace helper scripts to python Merge the sched_switch ftrace helper scripts into a single python script that prints the result to stdout. In this way it's possible to generate a perfetto-compatible trace running: $ sudo ./scripts/sched_ftrace.py > sched.ftrace Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-14 08:44:14 +02:00
Daniel Hodges	e456c83536	scx_layered: Add topology integration test Add a bpftrace script that does a topology aware test. The test script runs a bpftrace script that asserts that stress-ng processes are scheduled on NUMA node 0 only. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-13 20:23:11 -07:00
Ming Yang	f7cdf08754	scx_mitosis: Fix static assertion of scx_bpf_task_cgroup failing __weak check it failed the static assertion in macro bpf_ksym_exists. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Ming Yang	7067092555	Add vmlinux.h for multiple arch Following the change of using per-arch vmlinux.h. Add it for the remaining archs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Ming Yang	a23f3566e3	Use per-arch vmlinux.h vmlinux.h is not compatible across archs. Handle this compatibility issue by * Add arch info into vmlinux.h real file name * Link vmlinux.h to the target-arch real file at build time * Use target-arch real file for scx_utils bindgen. Also refactored clang related logic into a new clang_info mod, which is shared by bpf_builder.rs and builder.rs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-10-13 07:57:12 -07:00
Daniel Hodges	03f078ac74	Merge pull request #792 from hodgesds/ftrace-perfetto scripts: Add ftrace perfetto helper scripts	2024-10-13 13:03:47 +00:00
Daniel Hodges	cc3fede8e0	scripts: Add ftrace helper scripts Add a set of ftrace helper scripts for making perfetto compatible ftrace scheduler profiles. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-13 09:00:07 -04:00
Changwoo Min	ba9d75a6ab	Merge pull request #791 from multics69/lavd-sched-sample scx_lavd: misc updates	2024-10-13 08:52:08 +00:00
Changwoo Min	c1f4051a14	scx_lavd: fix int overflow in calculating avg_lat_cri u32 is not big enough to hold the sum of lat_cri in a period, so sum_lat_cri (u32) was overflown, resulting in incorrect avg_lat_cri. Change the type from u32 to u64, avoiding the interger overflow. Note that {sum/avg}_lat_cri is only for deubugging so it is irrelevant in making scheduling decisions. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:58:36 +09:00
Changwoo Min	6c9bbe66dc	scx_lavd: remove unnecessary downscaling in deadline calculation The downscaling is not necessary in calculating task's virtual deadline because virtual dealine represents only relative order in task scheduling. Hence downscaling incurs only inacuracy caused by truncation. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:41:23 +09:00
Changwoo Min	dce67296bf	Merge pull request #790 from multics69/lavd-sched-sample scx_lavd: do not inspect scx_lavd process itself	2024-10-12 14:18:59 +00:00
Daniel Hodges	a846fb3be6	Merge pull request #789 from hodgesds/vtime-dist Add bpftrace script to print vtime distributions across DSQs	2024-10-12 12:46:59 +00:00
Daniel Hodges	f59b73b97c	scripts: Add vtime distribution script Add bpftrace script to print the distribution of vtime across DSQs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-12 08:40:16 -04:00
Changwoo Min	6ddc3f0a2b	scx_lavd: do not inspect scx_lavd process itself Print the task status of scx_lavd is not useful, so filter it out. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-12 17:21:08 +09:00
Andrea Righi	7f3b0cb739	Merge pull request #780 from sched-ext/bpfland-remove-pcpu-dsq scx_bpfland: drop per-cpu DSQs	2024-10-12 06:25:35 +00:00
Andrea Righi	09fee68a6b	Merge pull request #785 from sched-ext/rustland-core-fix-mm-kprobe scx_rustland_core: use handle_mm_fault kprobe	2024-10-12 06:20:15 +00:00
Andrea Righi	197dee93f4	scx_bpfland: get rid of per-CPU DSQs Using per-CPU DSQs seems to introduce more issues than benefits (potential stalls, etc.). Therefore, let's get rid of the per-CPU DSQs and use SCX_DSQ_LOCAL for tasks directly dispatched to specific CPUs. This change seems to also improve performance on 6.12 and it makes the scheduler a lot more stable and consistent. The issues will be investigated separately, providing a separate stress test scheduler, designed to stress test per-CPU DSQs. Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:15:51 +02:00
Andrea Righi	198f22656c	scx_bpfland: clarify error code returned by pick_idle_cpu() Return more meaningful error codes from pick_idle_cpu(). No functional change, just improved code readability. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	ceb4f1755f	scx_bpfland: always refill task timeslice in ops.dispatch() When a task exhausts its timeslice and no other tasks are ready to run, we automatically refill its timeslice, but only if the current CPU is a fully idle SMT core. If we don’t handle the refill, the sched_ext core will default to refilling using SCX_SLICE_DFL, which may not be optimal. To ensure better control over the task’s timeslice, always refill it when no other tasks are available to run. Fixes: `6e24fcc` ("scx_bpfland: keep tasks running on full-idle SMT cores") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Andrea Righi	54d704ceda	scx_bpfland: pick a random idle CPU when prev_cpu is not valid Pick any random idle CPU when the previous CPU isn't valid anymore according to the task's cpumask. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-12 08:08:48 +02:00
Changwoo Min	836cf9faa4	Merge pull request #779 from multics69/lavd-futex-v2 scx_lavd: mitigate the lock holder preemption problem	2024-10-12 02:42:33 +00:00
Daniel Hodges	a950f96353	Merge pull request #787 from hodgesds/layered-clean scx_layered: Cleanup non topology path	2024-10-11 18:26:23 +00:00
Daniel Hodges	a08a76ccd6	scx_layered: Cleanup non topology path More cleanup in the non topology path to remove copy/pasta declarations. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-10-11 10:18:34 -07:00
Jake Hillion	eb59085e61	Merge pull request #781 from JakeHillion/pr781 layered: move configuration into library component	2024-10-11 16:39:23 +00:00
Jake Hillion	caff46e864	Merge pull request #786 from JakeHillion/pr786 layered: make default value for disable_topology dynamic	2024-10-11 16:25:45 +00:00
Jake Hillion	52c279a469	layered: make default value for disable_topology dynamic Disable topology currently defaults to `false` (topology enabled...). Change this so that topology is enabled by default on hardware that may benefit from it (multiple NUMA nodes or LLCs) and disabled on hardware that does not benefit from it. This is a slightly noisy change as we have to move ownership of the newly mutable layer specs into the `Scheduler` object (previously they were a borrow). We don't have a `Topology` object to make the default decision from until `Scheduler::init`, and I think this is because of the possibility of hot plugs. We therefore have to clone the `Vec<LayerSpec>` each time as it is potentially mutable. Test plan: - CI. Updated to be explicit about topology in both cases. Single NUMA multi-LLC machine: ``` $ scx_layered --run-example ... 13:34:01 [INFO] Topology awareness not specified, selecting enabled based on hardware ... $ scx_layered --run-example --disable-topology=true ... 13:33:41 [INFO] Disabling topology awareness ... $ scx_layered --run-example -t ... 13:33:15 [INFO] Disabling topology awareness ... $ scx_layered --run-example --disable-topology=false # none of the above messages present ``` Single NUMA single LLC machine: ``` $ scx_layered --run-example 15:33:10 [INFO] Topology awareness not specified, selecting disabled based on hardware ```	2024-10-11 17:09:07 +01:00
Jake Hillion	143a55cda1	layered: move configuration into library component Move the LayerConfig and its children from `main.rs` into `lib.rs`. This allows other tooling, such as config managers or test executors, to modify layered configs programmatically. The end goal is to move everything in `layered` except for the argument parsing into a `run_layered` function, but I haven't done it in this diff because it's a larger change. This is a common pattern in Rust projects to do as little as possible in `main.rs` for extensibility. The only change here, other than publicity and where things are located, is the signature of `CpuPool::alloc_cpus`. It previously relied on `&Layer`, and this changes it to the two elements of `Layer` it uses. This allows `Layer` to stay confined to `main.rs` (for now) to prevent scope creep in this PR. This may be inconvenient in the short term for WIPs and anyone doing non-Cargo builds (cough me), but having things split into more files should make rebases/merges easier in the long run. Test plan: - `cargo build --release` - CI.	2024-10-11 15:55:29 +01:00
Andrea Righi	b3c5a23693	scx_rustland_core: use handle_mm_fault kprobe The symbol __handle_mm_fault isn't available anymore in 6.12, let's rely on handle_mm_fault that is available both on 6.12 and older kernels. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-10-11 15:39:34 +02:00
Changwoo Min	648c95be9e	scx_lavd: fix incorrect task comparison for preemption Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 21:53:24 +09:00
likewhatevs	b88f567e25	Merge pull request #782 from likewhatevs/lsp-nice-util layered -- make lsp work nice on util include file	2024-10-11 12:30:19 +00:00
Pat Somaru	2b309dbbb4	make lsp work nice on util include	2024-10-11 08:06:29 -04:00
Pat Somaru	7627e1cc42	scx_layered: fix lsp etc on util.bpf.c	2024-10-11 08:02:23 -04:00
Changwoo Min	5b4b255cbb	scx_lavd: do not preempt while holding a lock When a task holds a lock, it should not yield its time slice or it should not be preempted out. In this way, we can mitigate harmful preemption of lock holders and reduce the total preemption counts. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:49:09 +09:00
Changwoo Min	bd17589a6e	scx_lavd: boost latency criticality when a task holds a lock When a lock holder exhausts its time slide, it will be re-enqueued to a DSQ waiting for shceduling while holding a lock. In this case, prioritize its latency criticality proportionally, so a lock holder would be not stuck in a DSQ for a long time, improving system-wide progress. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:48:56 +09:00
Changwoo Min	77b8e65571	scx_lavd: tracing all blocking locks and futexes Trace the acquisition and release of blocking locks for kernel and fuxtexes for user-space. This is necessary to boost a lock holder task in terms of latency and time slice. We do not boost shared lock holders (e.g., read lock in rw_semaphore) since the kernel already prioritizes the readers over writers. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 17:03:48 +09:00
Ryan Wilson	b6f09edc00	Merge pull request #777 from ryantimwilson/reverse-weight-dsq-layered [layered] Implement reverse weight DSQ algorithm	2024-10-10 20:18:14 +00:00
Ryan Wilson	8c8250b1e2	[layered] Implement reverse weight DSQ algorithm	2024-10-10 12:53:25 -07:00

1 2 3 4 5 ...

1972 Commits