scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-12-05 00:37:11 +00:00

Author	SHA1	Message	Date
Tejun Heo	b7b15ac4b1	scx_layered: Deprecate idle_smt layer config `24fba4ab8d` ("scx_layered: Add idle smt layer configuration") added the idle_smt layer config but - It flipped the default from preferring idle cores to not having preference. - Misdocumented what it meant. It doesn't only pick idle cores. It tries to pick an idle core first and if that fails pick any idle CPU. A follow-up commit `637fc3f6e1` ("scx_layered: Use layer idle_smt option") made it more confusing. If idle_smt is set, the idle core prioritizing logic is disabled. The first commit disables idle core prioritization by overriding idle_smtmask to be idle_cpumask if idle_smt is clear and the second commit disables the same by disabling the code path when the flag is set. ie. Both options did exactly the same thing. Recently, `75dd81e3e6` ("scx_layered: Improve topology aware select_cpu()") restored the function of the flag by dropping the cpumask override. However, this made the actual behavior the opposite of the documented one by leaving only the behavior of the second commit which implemented the reverse behavior. This flag is hopeless. History aside, the name itself is too confusing. idle_smt - is it saying that the flag is going to prefer idle smt thread or idle smt core? While the name is transferred from idle_cpumask/smtmask, there, the meaning of the former is clear which also makes it difficult to confuse what the latter means. Preferring idle cores was one of the drivers of performance gain identified during earlier ads experiments. Let's just drop the flag to restore the previous behavior and retry if necessary.	2024-11-30 00:19:39 -10:00
Tejun Heo	14af41d0dd	Merge pull request #1012 from sched-ext/htejun/layered-updates scx_layered: State tracking updates and layer sizing related fixes	2024-11-29 15:31:53 +00:00
Tejun Heo	c21c710e6f	scx_layered: Compensate for layer usage update fluctuations layer_usages are updated at ops.stopping(). If tasks in a layer keep running, it may not update the usage stats for a long time to the point where the reported usage stats fluctuate wildly making CPU allocation oscillate. Compensate for it by adding the time spent for the currently running task.	2024-11-29 01:17:45 -10:00
Tejun Heo	cc9d9c2e5d	scx_layered: Kick CPU if idle after assigning it to a layer So that a CPU doesn't sit idle while tasks are waiting on the CPU's new layer.	2024-11-29 00:25:54 -10:00
Tejun Heo	1a63c87812	scx_layered: Add more debug visibility to calc_target_nr_cpus()	2024-11-29 00:25:15 -10:00
Tejun Heo	49795bd8f2	scx_layered: Implement per-task avg runtime tracking and use it to calculate q latency Per-LLC layer queueing latencies were measured on each ops.running() transition and runtime avarged. Depending on the specific task, this average can swing wildly and it's difficult to base scheduling decisions on them. Instead, track per-task average runtime and then use them to determine the q latency as the sum of the average runtimes of the tasks on the q. While this requires atomic ops to maintain the sum, the operations are mostly LLC local and not noticeable. The up-to-date information will help making better scheduling decisions which should more than offset whatever additional overhead. LLC_LSTAT_LAT is now only used to monitor how llc_ctx->queued_runtime is behaving, so decay it slower. Also, don't quelch it to zero when LLC_LSTAT_CNT is 0 so that bugs in queued_runtime maintenance are visible.	2024-11-29 00:02:04 -10:00
Tejun Heo	99739bcd8d	scx_layered: Remove load metric The plan was using load metric for layer fairness but we went for explicit per-layer weight instead. Load metric is not used for anything and doesn't really add much. Remove it.	2024-11-28 16:25:15 -10:00
Tejun Heo	6a31b43c1b	scx_layered: Remove now unused const volatile disable_topology	2024-11-28 16:17:02 -10:00
Tejun Heo	5b57cdf3ad	Merge pull request #1008 from sched-ext/htejun/layered-updates scx_layered: Prioritize sched userspace and fix owned execution protection	2024-11-28 19:03:38 +00:00
Tejun Heo	4a95873bb7	Apply suggestions from code review Co-authored-by: Jake Hillion <jakehillion@meta.com>	2024-11-28 08:55:11 -10:00
Tejun Heo	3a1e67318d	Merge pull request #1002 from luigidematteis/remove-deprecated-bindgen-api-usage scx_utils: remove use of deprecated bindgen API; require bindgen >=0.69	2024-11-28 17:42:31 +00:00
Daniel Hodges	071e9465d9	Merge pull request #1007 from hodgesds/layered-big-little-refactor scx_layered: Fix idle selection on big/little	2024-11-28 17:34:01 +00:00
Daniel Hodges	7153a4a150	scx_layered: Fix idle selection on big/little Fix idle selection to take into account the layer growth algorithm to properly select big/little cores when selecting idle CPUs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-28 12:03:30 -05:00
Tejun Heo	0eb9bb701f	scx_layered: Fix owned execution protection in layered_stopping() There were a couple bugs in owned execution protection in layered_stopping(): - owned_usage_target_ppk is the property of the CPU's owning layer. However, we were incorrectly using the task's layer's. - A fallback CPU can belong to a layer under system saturation and both empty and in-layer execution counts as owned execution. However, we were incorrectly always setting owned_usage_target_ppk to 50% for a fallback CPU when the owner layer's target_ppk could be a lot higher. This could effectively take away ~50% CPU util from a layer which is trying to grow and prevent it from growing. Don't lower target_ppk for being the fallback CPU. After these fixes, `stress` starting in an empty non-preempting layer can reliably grow the layer to its weighted size while competing against saturating preempting layers.	2024-11-28 06:37:41 -10:00
Tejun Heo	fd9267fe91	scx_layered: Rename two stat fields Add _frac to util_protected and util_open as they are fractions of the total util of the layer. While at it, swap the two fields as util_open_frac directly affects layer sizing.	2024-11-28 05:31:31 -10:00
Luigi De Matteis	4e4fe034fc	scx_utils: remove use of deprecated bindgen API; require bindgen >=0.69 Signed-off-by: Luigi De Matteis <ldematteis123@gmail.com>	2024-11-28 12:16:41 +02:00
Andrea Righi	1f4bd50e2f	Merge pull request #999 from mmz-zmm/scx_bpfland-fix scx_bpfland: dump cache_id_map in ascending order	2024-11-28 09:06:51 +00:00
Zhao Mengmeng	08650f52dc	scx_bpfland: dump cache_id_map in ascending order Use BTreeMap to store cache_id_map so that prog can show L2/L3 cache info in ascending order, make it easy to lookup by human. Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>	2024-11-28 15:06:21 +08:00
Tejun Heo	e65903ce1e	scx_layered: Boost scx_layered userspace execution scx_layered userspace is critical in guaranteeing forward progress in timely manner under contention. Always put them in hi fallback DSQ. Also, as we're now boosting hi fallback above preempt, drop the preempt check before boosting.	2024-11-27 20:19:27 -10:00
Tejun Heo	97b29d362c	Merge pull request #997 from sched-ext/htejun/layered-updates-more scx_layered: Reimplement layered_dispatch()	2024-11-28 05:41:28 +00:00
Tejun Heo	e9e67e8ce2	Merge branch 'main' into htejun/layered-updates-more	2024-11-27 16:40:11 -10:00
Tejun Heo	0b237c638e	scx_layered: Reimplement layered_dispatch() Having two separate implementations for topo and no-topo cases makes the code difficult to make modifications and maintain. Reimplement so that: - Layer and LLC orders are determined by userspace and the BPF code iterates over them. The iterations are performed using two helpers. The new implementations overhead is lower for both topo and no-topo paths. - In-layer execution protection is always enforced. - Hi fallback is prioritized over preempting layers to avoid starving kthreads. This ends up also prioritizing tasks w/ custom affinities. Once lo fallback starvation avoidance will be implemented, those will be pushed there. - Fallback CPU prioritizes empty over preempt layers to guarantee that empty layers can quickly grow under saturation. Empty layers are set by userspace and there can be a race where the layer executing scx_layered itself became empty and the cpumasks for the layer is updated and then scx_layered gets pushed off CPU before empty layers are updated, which can lead to stall of scx_layered binary. This will be solved by treating scx_layered process specially.	2024-11-27 16:21:00 -10:00
likewhatevs	09445b1a49	Merge pull request #996 from sched-ext/htejun/layered-updates scx_layered: Implement in-layer execution protection to replace cost based fairness	2024-11-27 18:59:32 -05:00
Tejun Heo	bec5265e7b	scx_layered: Build LLC proximity map This will be used to simplify dispatch path.	2024-11-27 13:47:55 -10:00
Tejun Heo	eb817ca409	scx_layered: Move cpu_ctx functions into Scheduler for consistency No functional changes.	2024-11-27 12:41:16 -10:00
Changwoo Min	30250e8d85	Merge pull request #990 from multics69/lavd-lhp-fairness scx_lavd: Limit the slice extension of a lock holder	2024-11-27 22:11:25 +00:00
Daniel Hodges	d0591cbffe	Merge pull request #995 from hodgesds/freq-tracd scripts: Add bpftrace script to trace CPU frequency	2024-11-27 21:12:04 +00:00
Tejun Heo	b18e14e220	Merge branch 'main' into htejun/layered-updates	2024-11-27 11:10:13 -10:00
Tejun Heo	f30950c2b8	scx_layered: Drop cost based fairness code Now that layers are allocated CPUs according to their weights under system saturation and in-layer execution can be protected against preemption, layer weights can be enforced solely through CPU allocation. While there are still a couple missing pieces - dispatch ordering issue and fallback DSQ protection, the framework is in place. Drop the now superfluos cost based fairness mechanism.	2024-11-27 10:42:48 -10:00
Tejun Heo	1491d5c1f8	scx_layered: Empty layers should consider owned+open when calculating target nr_cpus The previous commit made empty layers executing on the fallback CPU count as owned execution instead of open so that it can be protected from preemption. This broke target number of CPUs calculation for empty layers as it was only looking at open execution time. Update calc_target_nr_cpus() so that it considers both owned and open execution time for empty layers.	2024-11-27 10:23:24 -10:00
Tejun Heo	205e5b4e29	scx_layered: Implement in-layer execution protection from preemption Currently, a preempting layer can completely starve out non-preempting ones regardless of weight or other configurations. Implement protection from preemption where each CPU tries to protect in-layer execution beyond the high util range and upto full utilization under saturation. Fallback CPU is also protected to run empty layers upto 50% to guarantee that empty layers can easily start growing.	2024-11-27 09:52:54 -10:00
Daniel Hodges	fb287865f4	scripts: Add bpftrace script to trace CPU frequency Add a bpftrace script for tracing frequency changes. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-27 11:39:26 -08:00
likewhatevs	520116fe44	Merge pull request #992 from likewhatevs/fix-pages-build fix document generation	2024-11-27 15:00:17 +00:00
Andrea Righi	37f4e72d55	Merge pull request #993 from sched-ext/scheds-use-llc-id scheds: use llc id	2024-11-27 08:36:31 +00:00
Andrea Righi	ae45131849	scx_flash: rely on llc_id instead of l3_id Commit `d971196` ("scx_utils: Rename hw_id and add sequential llc id") makes llc id unique across NUMA nodes, so rely on this value to build the LLC scheduling domain. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-27 08:25:09 +01:00
Andrea Righi	ce84f42158	scx_bpfland: rely on llc_id instead of l3_id Commit `d971196` ("scx_utils: Rename hw_id and add sequential llc id") makes llc id unique across NUMA nodes, so rely on this value to build the LLC scheduling domain. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-27 08:20:07 +01:00
Pat Somaru	f70aac1a5f	fix document generation	2024-11-27 00:47:21 -05:00
Changwoo Min	0213af5e17	scx_lavd: Limit the slice extension for a lock holder The scheduler extends the lock holder's time slice at ops.dispatch() to avoid preempting the lock holder, so slowing down the system-wide progress. However, this opens the possibility that the slice extension is abused by a lock holder. To mitigate the problem, check if a task's time slice is extended (lock_holder_xted) but the task is not lock holder. That means a task's time slice was extended, but it released the lock after that. In this case, give up the rest of the extended time slice of the task. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-27 10:52:04 +09:00
Daniel Hodges	c9f8ffa66f	Merge pull request #982 from hodgesds/topo-fixes scx_utils: Add core_id/llc_id to topology as a unique identifiers	2024-11-27 01:38:25 +00:00
Daniel Hodges	d971196595	scx_utils: Rename hw_id and add sequential llc id Make llc_id a monotonically increasing unique value and rename hw_id to kernel_id for topology structs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-26 17:32:09 -08:00
Tejun Heo	641d6357bb	Merge pull request #985 from abrehman94/main Introduce for_each_possible_cpu() for_each_online_cpu() iterators	2024-11-26 22:02:08 +00:00
Daniel Hodges	f3bbbcfaf8	scx_layered: Update core_id to be unique On systems with multiple NUMA nodes core_ids can be reused. Create a hw_id that is monotonically increasing that can be used to uniquely identiy CPU cores. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-26 12:48:34 -08:00
Abdul Rehman	0936414706	Introduce for_each_possible_cpu() for_each_online_cpu() iterators fixes: #966	2024-11-26 14:52:21 -05:00
Changwoo Min	c15d9ead01	Merge pull request #984 from multics69/lavd-cleanup-v2 scx_lavd: Minor code clean up	2024-11-26 10:27:20 +09:00
Changwoo Min	801111adb1	Merge branch 'sched-ext:main' into lavd-cleanup-v2	2024-11-26 10:25:43 +09:00
Tejun Heo	10339d35a4	Merge pull request #983 from CachyOS/fixes/loader-scx-stalls scx_loader: restart scheduler upon fail	2024-11-25 22:23:36 +00:00
Daniel Hodges	5726eee03a	Merge pull request #965 from hodgesds/layered-growth-refactor scx_layered: Refactor layer growth order	2024-11-25 22:01:11 +00:00
Vladislav Nepogodin	944510d1fc	scx_loader: restart scheduler upon fail fixes https://github.com/sched-ext/scx/issues/937	2024-11-26 01:43:29 +04:00
Daniel Hodges	f2c963c065	Merge branch 'main' into layered-growth-refactor	2024-11-25 14:38:14 -05:00
Tejun Heo	c7b87eb3c3	Merge pull request #980 from sched-ext/htejun/layered-updates scx_layered: Track owned/open execution times and per-LLC-layer stats	2024-11-25 19:09:15 +00:00

1 2 3 4 5 ...

2384 Commits