JakeHillion/scx

mirror of https://github.com/JakeHillion/scx.git synced 2024-11-25 11:00:24 +00:00

Author	SHA1	Message	Date
Daniel Hodges	8f89fc6876	Merge pull request #956 from hodgesds/layered-optimize-v1 scx_layered: Remove high fallback dsq budget check	2024-11-21 20:38:22 +00:00
Daniel Hodges	37abbf0db7	scx_layered: Remove high fallback dsq budget check Remove check if high fallback DSQ has the highest budget and aggressively consume from fallback DSQs. This is a performance optimization that yields a small improvement in performance when running synthetic load tests. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-21 12:12:44 -08:00
Jake Hillion	7d3bdf758a	Merge pull request #955 from JakeHillion/pr955 rust-toolchain: init file	2024-11-21 19:39:33 +00:00
Jake Hillion	9b319c7e2d	rust-toolchain: init file Test plan: - CI - On a machine with rustup this works.	2024-11-21 19:27:03 +00:00
Jake Hillion	cdb299ee1a	Merge pull request #953 from JakeHillion/pr953 cargo fmt to match ci	2024-11-21 18:33:36 +00:00
Jake Hillion	7056be9328	cargo fmt to match ci	2024-11-21 18:29:04 +00:00
Changwoo Min	52b189a1ca	Merge pull request #952 from multics69/lavd-cleanup scx_lavd: Optimize the layout of struct task_ctx	2024-11-21 16:43:05 +09:00
Changwoo Min	803a7306f1	scx_lavd: Optimize the layout of struct task_ctx Reduce the size of struct task_ctx from 3 cache lines to 2 cache lines by dropping unnecessary fields and optimizing the struct layout. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-21 16:34:23 +09:00
Daniel Hodges	52e463792b	Merge pull request #950 from hodgesds/layered-tick-timer scx_layered: Use PROG_RUN for cpumask updates	2024-11-20 23:01:34 +00:00
Daniel Hodges	a86e62aa21	scx_layered: Use PROG_RUN to update cpumasks Use bpf PROG_RUN from userspace for updating cpumask for rather than relying on scheduler ticks. This should be a lower overhead approach in that an extra bpf program does not need to be called on every CPU during tick. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-20 14:32:31 -08:00
Jake Hillion	dabb4aff41	scx_utils/scx_layered: bump to 1.0.7 `9bdccdd` Merge pull request #943 from JakeHillion/pr943 `b4b1879` Merge pull request #947 from sirlucjan/scx_loader_update `f2384fe` scx_loader: update docs `198f079` Merge pull request #930 from hodgesds/topo-irq `41781fe` scx_layered: Add netdev IRQ balancing node support `e30e5d8` scx_utils: Add netdev support `8c09ae2` Merge pull request #942 from CachyOS/feat/loader-add-flash `c258199` replace goto with unrolled loop in antistall_set `d5d4f46` scx_loader: add scx_flash as supported scheduler `489ce8a` Merge pull request #939 from sched-ext/htejun/layered-updates `dbcd233` scx_layered: Work around verification failure in antistall_set() on old kernels `61f378c` Merge pull request #931 from multics69/lavd-osu `88c7d47` Merge pull request #934 from sched-ext/htejun/layered-updates `aec9e86` Merge branch 'main' into htejun/layered-updates `10bf25a` topology, scx_layered: Make --disable-topology handling more consistent `ff0e9c6` Merge pull request #933 from hodgesds/layered-verifier-nested `1869dd8` scx_layered: Fix verifier issues on older kernels `68e1741` scx_layered: Use cached cpu_ctx->hi_fallback_dsq_id and cpu_ctx->cached_idx `827af0b` scx_layered: Fix dsq_id indexing bugs `f2c9e7f` scx_layered: Don't use tctx->last_cpu when picking target llc `519a27f` Merge pull request #932 from sched-ext/htejun/layered-updates `ce30010` scx_layered: Don't limit antistall execution to layered_cpumask `77eec19` Merge pull request #929 from sched-ext/htejun/layered-updates `65b49f8` Merge pull request #928 from purplewall1206/patch-1 `8e6e3de` Merge branch 'main' into patch-1 `a7fcda8` Merge pull request #924 from sched-ext/scx-fair `5b4b6df` Merge branch 'main' into scx-fair `3292be7` scx_lavd: Factor the task's runtime more aggressively in a deadline calculation `56e0dae` scx_layered: Fix linter disagreement `93a0bc9` scx_layered: Fix consume_preempting() when --local-llc-iteration `51d4945` scx_layered: Don't call scx_bpf_cpuperf_set() unnecessarily `678b101` scheds: introduce scx_flash `c7faf70` fix compile errors `75dd81e` scx_layered: Improve topology aware select_cpu() `2b52d17` scx_layered: Encapsulate per-task layered cpumask caching `1293ae2` scx_layered: Stat output format update `66223bf` Merge pull request #926 from JakeHillion/pr926 `d35d527` layered: split out common parts of LayerKind `9016416` Merge pull request #925 from hodgesds/layered-lol `1afb7d5` scx_layered: Fix formatting `79125ef` Merge pull request #919 from hodgesds/layered-dispatch-local `3a3a7d7` Merge branch 'main' into layered-dispatch-local `db46e27` Merge pull request #923 from hodgesds/layered-dsq-preempt-fix `4fc0509` scx_layered: Add flag to control llc iteration on dispatch `0096c06` scx_layered: Fix cost accounting for dsqs `72f21db` Merge pull request #922 from hodgesds/layered-cost-dump-fixes `7631049` Merge pull request #921 from hodgesds/layered-formatting-fix `f7009f7` scx_layered: Fix dump format `ff15f25` scx_layered: Fix formatting `6733168` Merge pull request #918 from hodgesds/layered-slice-helper `775d09a` scx_layered: Consume from local LLCs for dispatch `4fb05d9` Merge pull request #920 from hodgesds/layered-consume-fix `b2505e7` Merge branch 'main' into layered-consume-fix `1ed387d` scx_layered: Fix error in dispatch consumption `cad3413` scx_layered: Add helper for layer slice duration `835f0d0` Merge pull request #890 from likewhatevs/layered-dsq-timer `89f4aa1` scx_layered: add antistall `38512bf` Merge pull request #916 from sched-ext/htejun/scx_layered-verifier-workaround `bb91ad0` scx_layered: Work around older kernels choking on function calls from sleepable progs `5280206` Merge pull request #915 from LohithCV/lavd_doc_err `a2e119a` scx_lavd: docs: fix typos `007fed0` Merge pull request #913 from hodgesds/layered-fallback-dump `3b47782` scx_layered: Add fallback costs to dump `73926d6` Merge pull request #912 from hodgesds/layered-mask-cleanup `5ae1b84` Merge pull request #908 from JakeHillion/pr908 `ee4fd3d` scx_layered: Cleanup cpumask `9a282e0` Merge pull request #911 from hodgesds/layered-idle-smt-cleanup `637fc3f` scx_layered: Use layer idle_smt option `f71a9d0` Merge pull request #910 from hodgesds/layered-cost-verifier-fix `7db2ef2` scx_layered: Fix verifier issue on older kernels `ba54808` layered/topo: lift layer specific checks out of per-LLC loop `218cbea` Merge pull request #907 from sched-ext/scx-loader-update-bpfland-options `191cc7f` scx_loader: tune scx_bpfland default options `416de68` Merge pull request #904 from multics69/lavd-drop-padding `56357a7` Merge pull request #903 from multics69/lavd-issue-897 `d7e1f69` Merge pull request #906 from hodgesds/layered-verifier-fix `3cc849f` scx_layered: Fix verifier issue when tracing `d6ba3b7` Merge pull request #896 from hodgesds/layered-dsq-cost `487baa4` scx_layered: Add fallback DSQ cost accounting `debe991` Merge pull request #905 from likewhatevs/kconfig-cache-update `27a506f` add CONFIG_IKCONFIG to ci Kconfig and bump cache ver `22cb9e9` scx_lavd: drop padding in cpdom_cpumask, which was a workaround `e9ba2d5` scx_lavd: update cur_logical_clk atomically `d0111b3` Merge pull request #900 from likewhatevs/enable-iconfig-proc `b962ea8` Merge pull request #894 from etsal/core_enums `5e35a12` remove stray print `7d44511` fix missing/extraneous newline `4288040` Merge branch 'main' of https://github.com/sched-ext/scx into core_enums `de5f2f9` regenerate autogen Rust file `f088540` fix linting error in autogenerated code `2f174db` use the enum singleton in the userspace scheduler components `1cabed9` Autogenerate enums and BPF enum setters for Rust schedulers `d500c50` add autogenerated enum definitions for Rust schedulers `fc6ad5c` add CONFIG_IKHEADERS_PROC to ci kconfig `479d515` Merge branch 'main' into core_enums `23f302c` add SCX_SLICE_* macros to scx_utils and use them for the Rust schedulers `c545d23` factor enum handling into existing headers/operations `a1d0e7e` autogenerate scx enum definitions `31b9fb4` set all enums in userspace before loading `ff861d3` introduce CO:RE enum readers and use them for scx_central	2024-11-20 12:03:02 -08:00
Jake Hillion	9bdccdd6f3	Merge pull request #943 from JakeHillion/pr943 replace goto with unrolled loop in antistall_set	2024-11-20 19:06:17 +00:00
Andrea Righi	b4b18797ba	Merge pull request #947 from sirlucjan/scx_loader_update scx_loader: update docs	2024-11-20 15:19:22 +00:00
Piotr Gorski	f2384fe938	scx_loader: update docs Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>	2024-11-20 15:25:48 +01:00
Daniel Hodges	198f0790c7	Merge pull request #930 from hodgesds/topo-irq [RFC] scx_layered: Add netdev IRQ balancing	2024-11-20 03:11:43 +00:00
Daniel Hodges	41781fea79	scx_layered: Add netdev IRQ balancing node support Add support for setting netdev IRQ balancing that is NUMA aware. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-19 18:58:35 -08:00
Daniel Hodges	e30e5d8825	scx_utils: Add netdev support Add support for collecting network device support for use in topology awareness. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-19 17:08:51 -08:00
Tejun Heo	8c09ae21c9	Merge pull request #942 from CachyOS/feat/loader-add-flash scx_loader: add scx_flash as supported scheduler	2024-11-19 17:53:01 +00:00
Jake Hillion	c258199f47	replace goto with unrolled loop in antistall_set	2024-11-19 17:10:57 +00:00
Vladislav Nepogodin	d5d4f463f9	scx_loader: add scx_flash as supported scheduler	2024-11-19 20:58:56 +04:00
Tejun Heo	489ce8a766	Merge pull request #939 from sched-ext/htejun/layered-updates scx_layered: Work around verification failure in antistall_set() on o…	2024-11-19 08:02:03 +00:00
Tejun Heo	dbcd233f17	scx_layered: Work around verification failure in antistall_set() on old kernels In earlier kernels, the iterator variable wasn't trusted making the verifier choke on calling kfuncs on its dereferences. Work around by re-looking up the task by PID.	2024-11-18 21:37:36 -10:00
Changwoo Min	61f378c1cd	Merge pull request #931 from multics69/lavd-osu scx_lavd: Factor the task's runtime more aggressively in a deadline calculation	2024-11-19 00:29:33 +00:00
Tejun Heo	88c7d47314	Merge pull request #934 from sched-ext/htejun/layered-updates scx_layered: Cleanups around topology handling	2024-11-18 23:12:48 +00:00
Tejun Heo	aec9e86797	Merge branch 'main' into htejun/layered-updates	2024-11-18 12:19:42 -10:00
Tejun Heo	10bf25a65f	topology, scx_layered: Make --disable-topology handling more consistent When --disable-topology is specified the topology information (e.g. llc map) supplied to the BPF code disagrees with how the scheduler operates requiring code paths to be split unnecessarily and making things error-prone (e.g. layer_dsq_id() returned wrong value with --disable-topology). - Add Topology::with_flattened_llc_node() which create a dummy topo with one llc and node regardless of the underlying hardware and make layered use it when --disable-topology. - Add explicit nr_llcs == 1 handling to layer_dsq_id() to generate better code when topology is disabled and remove explicit disable_topology branches in the callers. - Fix layer->cache_mask when a layer doesn't explicitly specify nodes and drop the disable_topology branch in layered_dump().	2024-11-18 12:19:01 -10:00
Daniel Hodges	ff0e9c621c	Merge pull request #933 from hodgesds/layered-verifier-nested scx_layered: Fix verifier issues on older kernels	2024-11-18 21:48:59 +00:00
Daniel Hodges	1869dd8a2d	scx_layered: Fix verifier issues on older kernels On 6.9 kernels the verifier is not able to track `struct bpf_cpumasks` properly on nested structs. Move the cpumasks from the `cached_cpus` struct back to the `task_ctx` struct so older versions of the verifier can pass. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-18 13:18:57 -08:00
Tejun Heo	68e1741351	scx_layered: Use cached cpu_ctx->hi_fallback_dsq_id and cpu_ctx->cached_idx - Remember hi_fallback_dsq_id for each CPU in cpu_ctx and use the remembered values. - Make antistall_scan() walk each hi fallback DSQ once instead of multiple times through CPU iteration. - Remove unused functions.	2024-11-18 10:09:24 -10:00
Tejun Heo	827af0b7ef	scx_layered: Fix dsq_id indexing bugs keep_running() and antistall_scan() were incorrectly assuming that layer->index equals DSQ ID. Fix them. Also, remove a compile warning while at it around cpumask cast.	2024-11-18 09:53:39 -10:00
Tejun Heo	f2c9e7fddd	scx_layered: Don't use tctx->last_cpu when picking target llc It's confusing to use tctx->last_cpu for making active choices as it makes layered deviate from other schedulers unnecessarily. Use last_cpu only for migration accounting in layered_running(). - In layered_enqueue(), layered_select_cpu() already returned prev_cpu for non-direct-dispatch cases and the CPU the task is currently on should match tctx->last_cpu. Use task_cpu instead. - In keep_running(), the current CPU always matches tctx->last_cpu. Always use bpf_get_smp_processor_id().	2024-11-18 09:16:08 -10:00
Tejun Heo	519a27f920	Merge pull request #932 from sched-ext/htejun/layered-updates scx_layered: Don't limit antistall execution to layered_cpumask	2024-11-18 18:51:25 +00:00
Tejun Heo	ce300101ed	scx_layered: Don't limit antistall execution to layered_cpumask A task may end up in a layer which doesn't have any CPUs that are allowed for the task. They are accounted as affinity violations and put onto a fallback DSQ. When antistall_set() is trying to find the CPU to run a stalled DSQ, it ignores CPUs that are not in the first task's layered_cpumask. This makes antistall skip stalling DSQs with affnity violating tasks at the front. Consider all allowed CPUs for affinity violating tasks. While at it, combine the two if blocks to set antistall to improve readability.	2024-11-18 08:41:20 -10:00
Tejun Heo	77eec19792	Merge pull request #929 from sched-ext/htejun/layered-updates scx_layered: Perf improvements and a bug fix	2024-11-18 17:41:40 +00:00
Tejun Heo	65b49f8d30	Merge pull request #928 from purplewall1206/patch-1 fix compile errors	2024-11-18 17:35:50 +00:00
Tejun Heo	8e6e3de639	Merge branch 'main' into patch-1	2024-11-18 05:23:51 -10:00
Andrea Righi	a7fcda82cc	Merge pull request #924 from sched-ext/scx-fair scheds: introduce scx_flash	2024-11-18 08:21:36 +00:00
Andrea Righi	5b4b6df5e4	Merge branch 'main' into scx-fair	2024-11-18 07:42:09 +01:00
Changwoo Min	3292be7b72	scx_lavd: Factor the task's runtime more aggressively in a deadline calculation Instead of using a constant runtime value in the deadline calculation, use the adjusted runtime value of a task. Since tasks' runtime value follows a highly skewed distribution, we convert the highly skewed distribution to a mildly skewed distribution to avoid stalls. This resolves the audio breaking issue in osu! under heavy background workloads. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-18 11:55:12 +09:00
Tejun Heo	56e0dae81d	scx_layered: Fix linter disagreement	2024-11-17 06:03:30 -10:00
Tejun Heo	93a0bc9969	scx_layered: Fix consume_preempting() when --local-llc-iteration consume_preempting() wasn't teting layer->preempt in consume_preempting() when --local-llc-iterations ending up treating all layers as preempting layers and often leading to HI fallback starvations under saturation. Fix it.	2024-11-17 05:54:03 -10:00
Tejun Heo	51d4945d69	scx_layered: Don't call scx_bpf_cpuperf_set() unnecessarily layered_running() is calling scx_bpf_cpuperf_set() whenever a task of a layer w/ cpuperf setting starts running which can be every task switch. There's no reason to repeatedly call with the same value. Remember the last value and call iff the new value is different. This reduces the bpftop reported CPU consumption of scx_bpf_cpuperf_set() from ~1.2% to ~0.7% while running rd-hashd at full CPU saturation on Ryzen 3900x.	2024-11-16 05:45:44 -10:00
Andrea Righi	678b10133d	scheds: introduce scx_flash Introduce scx_flash (Fair Latency-Aware ScHeduler), a scheduler that focuses on ensuring fairness among tasks and performance predictability. This scheduler is introduced as a replacement of the "lowlatency" mode in scx_bpfland, that has been dropped in commit `78101e4` ("scx_bpfland: drop lowlatency mode and the priority DSQ"). scx_flash operates based on an EDF (Earliest Deadline First) policy, where each task is assigned a latency weight. This weight is adjusted dynamically, influenced by the task's static weight and how often it releases the CPU before its full assigned time slice is used: tasks that release the CPU early receive a higher latency weight, granting them a higher priority over tasks that fully use their time slice. The combination of dynamic latency weights and EDF scheduling ensures responsive and stable performance, even in overcommitted systems, making the scheduler particularly well-suited for latency-sensitive workloads, such as multimedia or real-time audio processing. Tested-by: Peter Jung <ptr1337@cachyos.org> Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-16 14:49:25 +01:00
ppw	c7faf70a26	fix compile errors	2024-11-16 15:56:20 +08:00
Tejun Heo	75dd81e3e6	scx_layered: Improve topology aware select_cpu() - Cache llc and node masked cpumasks instead of calculating them each time. They're recalculated only when the task has migrated cross the matching boundary and recalculation is necessary. - llc and node masks should be taken from the wakee's previous CPU not the waker's CPU. - idle_smtmask is already considered by scx_bpf_pick_idle_cpu(). No need to and it manually. - big_cpumask code updated to be simpler. This should also be converted to use cached cpumask. big_cpumask portion is not tested. This brings down CPU utilization of select_cpu() from ~2.7% to ~1.7% while running rd-hashd at saturation on Ryzen 3900x.	2024-11-15 16:29:47 -10:00
Tejun Heo	2b52d172d4	scx_layered: Encapsulate per-task layered cpumask caching and fix build warnings while at it. Maybe we should drop const from cast_mask().	2024-11-15 14:30:03 -10:00
Tejun Heo	1293ae21fc	scx_layered: Stat output format update Rearrange things a bit so that lines are not too long.	2024-11-15 13:38:56 -10:00
Jake Hillion	66223bf235	Merge pull request #926 from JakeHillion/pr926 layered: split out common parts of LayerKind	2024-11-15 22:48:10 +00:00
Jake Hillion	d35d5271f5	layered: split out common parts of LayerKind We duplicate the definition of most fields in every layer kind. This makes reading the config harder than it needs to be, and turns every simple read of a common field into a `match` statement that is largely redundant. Utilise `#[serde(flatten)]` to embed a common struct into each of the LayerKind variants. Rather than matching on the type this can be directly accessed with `.kind.common()` and `.kind.common_mut()`. Alternatively, you can extend existing matches to match out the common parts as demonstrated in this diff where necessary. There is some further code cleanup that can be done in the changed read sites, but I wanted to make it clear that this change doesn't change behaviour, so tried to make these changes in the least obtrusive way. Drive-by: fix the formatting of the lazy_static section in main.rs by using `lazy_static::lazy_static`. Test plan: ``` # main $ cargo build --release && target/release/scx_layered --example /tmp/test_old.json # this change $ cargo build --release && target/release/scx_layered --example /tmp/test_new.json $ diff /tmp/test_{old,new}.json # no diff ```	2024-11-15 21:57:22 +00:00
Daniel Hodges	90164160a2	Merge pull request #925 from hodgesds/layered-lol scx_layered: Fix formatting	2024-11-15 17:01:56 +00:00

1 2 3 4 5 ...

2242 Commits