scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-28 05:30:24 +00:00

Author	SHA1	Message	Date
Tejun Heo	ad45727139	version: v1.0.6	2024-11-06 06:54:26 -10:00
Tejun Heo	9cbe592382	Merge pull request #898 from multics69/main scx_lavd: fix a performance regression bug (`perf bench -f simple sched messaging`)	2024-11-06 16:40:43 +00:00
Dan Schatzberg	af2cb1abbe	scx_mitosis: add RCU-like synchronization scx_mitosis relied on the implicit assumption that after a sched tick, all outstanding scheduling events had completed but this might not actually be correct. This feels like a natural use-case for RCU, but there is no way to directly make use of RCU in BPF. Instead, this commit implements an RCU-like synchronization mechanism. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-11-06 08:33:29 -08:00
Emil Tsalapatis	479d515a45	Merge branch 'main' into core_enums	2024-11-06 11:07:42 -05:00
Emil Tsalapatis	23f302cf13	add SCX_SLICE_* macros to scx_utils and use them for the Rust schedulers	2024-11-06 07:52:04 -08:00
Emil Tsalapatis	c545d23e79	factor enum handling into existing headers/operations	2024-11-06 07:03:40 -08:00
Andrea Righi	78101e4688	scx_bpfland: drop lowlatency mode and the priority DSQ Schedule all tasks using a single global DSQ. This gives a better control to prevent potential starvation conditions. With this change, scx_bpfland adopts a logic similar to scx_rusty and scx_lavd, prioritizing tasks based on the frequency of their wait and wake-up events, rather than relying exclusively on the average amount of voluntary context switches. Tasks are still classified as interactive / non-interactive based on the amount of voluntary context switches, but this is only affecting the cpufreq logic. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-06 15:06:39 +01:00
Changwoo Min	d0eeebf98a	scx_lavd: deprioritize a long runtime by prioritizing frequencies further Since tasks' average runtimes show skewed distribution, directly using the runtime in the deadline calculation causes several performance regressions. Instead, let's use the constant factor and further prioritize frequency factors to deprioritize the long runtime tasks. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-06 18:08:57 +09:00
Changwoo Min	cfe23aa21b	scx_lavd: avoid self-IPI at preemption Revert the change of sending self-IPI at preemption when a victim CPU is the current CPU. The cost of self-IPI is prohibitively expensive in some workloads (e.g., perf bench). Instead, resetting task' time slice to zero. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-06 13:59:24 +09:00
Emil Tsalapatis	a1d0e7e638	autogenerate scx enum definitions	2024-11-05 13:52:25 -08:00
Emil Tsalapatis	31b9fb4135	set all enums in userspace before loading	2024-11-05 12:20:31 -08:00
Emil Tsalapatis	ff861d3e2c	introduce CO:RE enum readers and use them for scx_central	2024-11-05 08:29:45 -08:00
Andrea Righi	efc41dd936	scx_bpfland: strict domain affinity Rather then always migrating tasks across LLC domains when no idle CPU is available in their current LLC domain, allow migration but attempt to bring tasks back to their original LLC domain whenever possible. To do so, define the task's scheduling domain upon task creation or when its affinity changes, and ensure the task remains within this domain throughout its lifetime. In the future we will add a proper load balancing logic, but for now this change seems to provide consistent performance improvement in certain server workloads. For example, simple CUDA benchmarks show a performance boost of about +10-20% with this change applied (on multi-LLC / NUMA machines). Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	064d6fb560	scx_bpfland: consider all tasks as regular if priority DSQ is congested This allows to prevent excessive starvation of regular tasks in presence of high amount of interactive tasks (e.g., when running stress tests, such as hackbench). Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	8a655d94f5	scx_bpfland: do not overly prioritize WAKE_SYNC tasks This can lead to stalls when a high number of interactive tasks are running in the system (i.e.., hackbench or similar stress tests). Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	f0c8de3477	scx_bpfland: do not exclude exiting tasks Add SCX_OPS_ENQ_EXITING to the scheduler flags, since we are not using bpf_task_from_pid() and the scheduler can handle exiting tasks. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	eb99e45ced	scx_bpfland: consistent vruntime update Ensure that task vruntime is always updated in ops.running() to maintain consistency with other schedulers. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Daniel Hodges	0f29854e51	Merge pull request #883 from vax-r/rusty_taskslen scx_rusty: Restore push domain tasks when no task found	2024-11-05 12:26:42 +00:00
I Hsin Cheng	a5fd42a719	scx_rusty: Fix filtering logic error Fix task filtering logic error to avoid the possibility of migrate the same task over again. The orginal logic operation was "\|\|" which might include tasks already migrated to be taken into consideration again. Change the condition to "&&" so we can elimate the error. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-11-05 14:46:40 +08:00
I Hsin Cheng	a248fdc5e3	scx_rusty: Restore push domain tasks when no task found Inside the function "try_find_move_task()", it returns directly when there's no task found to be moved. If the cause is from lack of ability to fulfilled the condition by "task_filter()", load balancer will try to find move task again and remove "task_filter()" by setting it directly to a function returns true. However, in the fallback case, the tasks within the domains will be empty. Swap the tasks back into domains vector before returning can solve the issue. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-11-05 14:25:29 +08:00
Changwoo Min	e10a92f392	Merge pull request #887 from multics69/scx-minor-fixes scx_lavd: entirely drop kernel lock tracing	2024-11-05 00:19:02 +00:00
Emil Tsalapatis	9c6ad33fda	Merge pull request #891 from etsal/global_costc scx_layered: point costc to global struct when initializing budgets	2024-11-04 21:41:47 +00:00
Emil Tsalapatis	61c0645781	Merge pull request #892 from etsal/remove_open_bool scx_layered: remove ->open from layer struct	2024-11-04 20:13:52 +00:00
Emil Tsalapatis	9cf137be99	scx_layered: remove ->open from layer struct	2024-11-04 11:42:23 -08:00
Emil Tsalapatis	2b0909f9d7	scx_layered: point costc to global struct when initializing budgets	2024-11-04 11:40:07 -08:00
Tejun Heo	fae5396fd6	Merge pull request #886 from sirlucjan/sync-flags scx: sync default flags with CachyOS Kernel Manager	2024-11-04 16:58:20 +00:00
Tejun Heo	7a912f0891	Merge pull request #889 from abrehman94/main [bug-fix] fixes #810	2024-11-04 16:35:12 +00:00
Abdul Rehman	3b0b5e6f38	[bug-fix] * increasing the number of bits for cpumask	2024-11-04 10:54:29 -05:00
Daniel Hodges	ee1df8572b	Merge pull request #888 from hodgesds/layered-cost-refactor scx_layered: Refactor cost naming	2024-11-04 15:15:28 +00:00
Daniel Hodges	5dcdcfc50f	scx_layered: Refactor cost naming Refactor cost struct usage so that it always is used as `costc` for clarity. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-04 07:07:13 -08:00
Changwoo Min	517ef89444	scx_lavd: drop all kernel lock tracing The combination of kernel versions and kerenl configs generates different kernel symbols. For example, in an old kernel version, __mutex_lock() is not generated. Also, there is no workaround from the fentry/fexit/kprobe side currently. Let's entirely drop the kernel locking for now and revisit it later. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 21:36:32 +09:00
Changwoo Min	796d324555	scx_lavd: improve readabilty Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 21:18:43 +09:00
Piotr Gorski	7efe286af0	scx: sync default flags with CachyOS Kernel Manager Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>	2024-11-04 11:03:16 +01:00
Changwoo Min	060a1662fa	Merge pull request #885 from multics69/ci-fix-v2 scx_lavd: fix CI error for missing kernel symbols	2024-11-04 09:31:34 +00:00
Changwoo Min	882212574a	scx_lavd: fix a variable name to kill a warning Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 16:20:53 +09:00
Changwoo Min	e1b880f7c3	scx_lavd: fix CI error for missing kernel symbols Revised the lock tracking code, relying on stable symbols with various kernel configurations. There are two changes: - Entirely drop tracing rt_mutex, which can be on and off with kconfig - Replace mutex_lock() families to __mutex_lock(), which is stable across kernel configs. The downside of such change is it is now possible to trace the lock fast path, so lock tracing is a bit less accurate. But let's live with it for now until a better solution is found. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 16:05:53 +09:00
Andrea Righi	20c0372624	Merge pull request #871 from sched-ext/bpfland-minor-cleanups scx_bpfland: minor cleanups	2024-11-02 16:20:03 +01:00
Andrea Righi	f6f5481081	Merge branch 'main' into bpfland-minor-cleanups	2024-11-02 15:28:37 +01:00
Daniel Hodges	ade6c6ab41	Merge pull request #875 from hodgesds/layered-format-fixes scx_layered Fix trace format	2024-11-01 17:17:45 -04:00
Daniel Hodges	7e0f2cd3f3	scx_layered Fix trace format Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-01 10:16:14 -07:00
Daniel Hodges	9bba51a485	Merge pull request #874 from hodgesds/layered-fallback-stall scx_layered: Add additional drain to fallback DSQs	2024-11-01 10:43:55 -04:00
Daniel Hodges	cb5b1961b8	Merge branch 'main' into layered-fallback-stall	2024-11-01 10:24:00 -04:00
Daniel Hodges	defbf385cd	Merge pull request #873 from hodgesds/layered-dump-fixes scx_layered: Fix dump output format	2024-11-01 10:23:24 -04:00
Daniel Hodges	0a518e9f9e	scx_layered: Add additional drain to fallback DSQs Fallback DSQs are not accounted with costs. If a layer is saturating the machine it is possible to not consume from the fallback DSQ and stall the task. This introduces and additional consumption from the fallback DSQ when a layer runs out of budget. In addition, tasks that use partial CPU affinities should be placed into the fallback DSQ. This change was tested with stress-ng --cacheline `nproc` for several minutes without causing stalls (which would stall on main). Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-01 06:04:55 -07:00
Daniel Hodges	a8109d3341	scx_layered: Fix dump output format Flip the order of layer id vs layer name so that the output makes sense. Example output: LO_FALLBACK nr_queued=0 -0ms COST GLOBAL[0][random] budget=22000000000 capacity=22000000000 COST GLOBAL[1][hodgesd] budget=0 capacity=0 COST GLOBAL[2][stress-ng] budget=0 capacity=0 COST GLOBAL[3][normal] budget=0 capacity=0 COST CPU[0][0][random] budget=62500000000000 capacity=62500000000000 COST CPU[0][1][random] budget=100000000000000 capacity=100000000000000 COST CPU[0][2][random] budget=124911500964411 capacity=125000000000000 Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-01 04:06:33 -07:00
Changwoo Min	cf6aa5d63b	Merge pull request #872 from multics69/lavd-opt-dispatch-v2 scx_lavd: optimize ops.dispatch() and DSQ placement	2024-11-01 19:52:58 +09:00
Changwoo Min	18a80977bb	scx_lavd: create DSQs on their associated NUMA nodes Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-01 11:18:22 +09:00
Changwoo Min	ce31d3c59e	scx_lavd: optimize consume_starving_task() Loop until the total number of DSQs not the theoretical maximum of DSQs. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-01 10:43:34 +09:00
Changwoo Min	673f80d3f7	scx_lavd: fix warnings Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-01 10:13:28 +09:00
Changwoo Min	4c3f1fd61c	Merge pull request #867 from vax-r/lavd_typo scx_lavd: Fix typos	2024-11-01 09:40:50 +09:00

1 2 3 4 5 ...

2182 Commits