scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-30 22:50:23 +00:00

Author	SHA1	Message	Date
Pat Somaru	89f4aa1351	scx_layered: add antistall add timer based antistall to scx_layered and new flags to enable/disable and specify seconds of delay before it turns on. also update ci config to make sure this verifies/runs.	2024-11-08 20:31:02 -05:00
Tejun Heo	bb91ad0084	scx_layered: Work around older kernels choking on function calls from sleepable progs Verifier in older kernels choke on function calls from sleepable progs triggering non-sensical RCU state error: frame1: R1_w=scalar(id=674,smin=smin32=0,smax=umax=smax32=umax32=51,var_off=(0x0; 0x3f)) R10=; return llc_ptr; 1072: (61) r0 = (u32 *)(r2 +0) ; frame1: R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=map_value(map=bpf_bpf.rodata,ks=4,vs=9570,off=4400,smin=smin32=0,smax=umax=smax32=umax32=204,var_off=(0x0; 0xfc)) refs=13,647 ; } 1073: (95) exit bpf_rcu_read_unlock is missing processed 10663 insns (limit 1000000) max_states_per_insn 8 total_states 615 peak_states 281 mark_read 20 -- END PROG LOAD LOG -- Work around by adding and using __always_inline variant of cpu_to_llc_id() from layered_init(). Note that we can't switch everyone to __always_inline as that can lead to verification failure due to ins limit.	2024-11-08 08:47:57 -10:00
Lohith C V	a2e119ae23	scx_lavd: docs: fix typos	2024-11-08 16:25:55 +05:30
Daniel Hodges	3b47782bf4	scx_layered: Add fallback costs to dump Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 19:49:09 -05:00
Daniel Hodges	73926d6481	Merge pull request #912 from hodgesds/layered-mask-cleanup scx_layered: Cleanup cpumask	2024-11-07 22:52:28 +00:00
Jake Hillion	5ae1b84533	Merge pull request #908 from JakeHillion/pr908 layered/topo: lift layer specific checks out of per-LLC loop	2024-11-07 21:48:20 +00:00
Daniel Hodges	ee4fd3dace	scx_layered: Cleanup cpumask Cleanup remaining cpumasks to use `cast_mask`. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 13:18:10 -08:00
Daniel Hodges	637fc3f6e1	scx_layered: Use layer idle_smt option When selecting the idle CPU use the idle_smt option on the layer. This may improve cache locality in some cases by placing tasks on CPUs that are on closer cache lines. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 13:05:19 -08:00
Daniel Hodges	7db2ef22d0	scx_layered: Fix verifier issue on older kernels On some older kernels layered fails to validate. Prevent certain helpers from being inlined to pass the verifier. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 12:20:58 -08:00
Jake Hillion	ba54808150	layered/topo: lift layer specific checks out of per-LLC loop The loops in topology aware mode were recently refactored to place the -per-LLC loops inside the per-layer loops. However, the layer specific checks were left in the inner loops, slowing this down unnecessarily. Pull the layer specific checks from the inner loop into the outer loop. Also changes these functions to `__weak` to ensure they don't get inlined - they're expected to be verified as global functions. Note to reviewers: this looks good to me, but I'd appreciate if you reviewed the De Morgan applications in detail. Test plan: - `cargo build --release && sudo target/release/scx_layered --run-example` on a machine with multiple LLCs. It's possible to stall it quite easily with stress-ng but I believe this is the case on main.	2024-11-07 18:34:44 +00:00
Changwoo Min	416de68b72	Merge pull request #904 from multics69/lavd-drop-padding scx_lavd: drop padding in cpdom_cpumask, which was a workaround	2024-11-07 16:13:07 +00:00
Changwoo Min	56357a79db	Merge pull request #903 from multics69/lavd-issue-897 scx_lavd: update cur_logical_clk atomically	2024-11-07 16:12:56 +00:00
Daniel Hodges	3cc849f234	scx_layered: Fix verifier issue when tracing Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 06:43:40 -08:00
Daniel Hodges	487baa4a03	scx_layered: Add fallback DSQ cost accounting Add fallback DSQ cost accounting so that fallback DSQ costs are accounted for and so that dispatch of fallback DSQs can be done in a standardized way. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-07 05:25:57 -08:00
Changwoo Min	22cb9e9ce1	scx_lavd: drop padding in cpdom_cpumask, which was a workaround The verifier error seems to stem from the wrong vmlinux.h. Also, PR #889 seems to completely fix the problem. So, drop the workaround. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-07 16:13:06 +09:00
Changwoo Min	e9ba2d53fa	scx_lavd: update cur_logical_clk atomically Previously, the cur_logical_clk is updated with WIRTE_ONCE(), which does not guarantee the atomicity when concurrent writes happen -- which is possible. So change it using CAS (compare-and-swap). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-07 16:01:50 +09:00
Emil Tsalapatis	5e35a12ce3	remove stray print	2024-11-06 18:02:08 -08:00
Emil Tsalapatis	42880404e1	Merge branch 'main' of https://github.com/sched-ext/scx into core_enums	2024-11-06 12:44:23 -08:00
Emil Tsalapatis	2f174db96f	use the enum singleton in the userspace scheduler components	2024-11-06 12:17:16 -08:00
Emil Tsalapatis	1cabed9d09	Autogenerate enums and BPF enum setters for Rust schedulers	2024-11-06 12:17:16 -08:00
Emil Tsalapatis	d500c50098	add autogenerated enum definitions for Rust schedulers	2024-11-06 12:17:16 -08:00
Dan Schatzberg	fb635cb8f0	Merge pull request #438 from dschatzberg/mitosis Refactor select_cpu + enqueue for proper synchronization and handle of !wakeup	2024-11-06 18:36:05 +00:00
Andrea Righi	f402f118db	Merge pull request #899 from sched-ext/bpfland-rework scx_bpfland: rework	2024-11-06 17:55:36 +00:00
Tejun Heo	ad45727139	version: v1.0.6	2024-11-06 06:54:26 -10:00
Dan Schatzberg	af2cb1abbe	scx_mitosis: add RCU-like synchronization scx_mitosis relied on the implicit assumption that after a sched tick, all outstanding scheduling events had completed but this might not actually be correct. This feels like a natural use-case for RCU, but there is no way to directly make use of RCU in BPF. Instead, this commit implements an RCU-like synchronization mechanism. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-11-06 08:33:29 -08:00
Emil Tsalapatis	479d515a45	Merge branch 'main' into core_enums	2024-11-06 11:07:42 -05:00
Emil Tsalapatis	23f302cf13	add SCX_SLICE_* macros to scx_utils and use them for the Rust schedulers	2024-11-06 07:52:04 -08:00
Andrea Righi	78101e4688	scx_bpfland: drop lowlatency mode and the priority DSQ Schedule all tasks using a single global DSQ. This gives a better control to prevent potential starvation conditions. With this change, scx_bpfland adopts a logic similar to scx_rusty and scx_lavd, prioritizing tasks based on the frequency of their wait and wake-up events, rather than relying exclusively on the average amount of voluntary context switches. Tasks are still classified as interactive / non-interactive based on the amount of voluntary context switches, but this is only affecting the cpufreq logic. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-06 15:06:39 +01:00
Changwoo Min	d0eeebf98a	scx_lavd: deprioritize a long runtime by prioritizing frequencies further Since tasks' average runtimes show skewed distribution, directly using the runtime in the deadline calculation causes several performance regressions. Instead, let's use the constant factor and further prioritize frequency factors to deprioritize the long runtime tasks. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-06 18:08:57 +09:00
Changwoo Min	cfe23aa21b	scx_lavd: avoid self-IPI at preemption Revert the change of sending self-IPI at preemption when a victim CPU is the current CPU. The cost of self-IPI is prohibitively expensive in some workloads (e.g., perf bench). Instead, resetting task' time slice to zero. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-06 13:59:24 +09:00
Andrea Righi	efc41dd936	scx_bpfland: strict domain affinity Rather then always migrating tasks across LLC domains when no idle CPU is available in their current LLC domain, allow migration but attempt to bring tasks back to their original LLC domain whenever possible. To do so, define the task's scheduling domain upon task creation or when its affinity changes, and ensure the task remains within this domain throughout its lifetime. In the future we will add a proper load balancing logic, but for now this change seems to provide consistent performance improvement in certain server workloads. For example, simple CUDA benchmarks show a performance boost of about +10-20% with this change applied (on multi-LLC / NUMA machines). Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	064d6fb560	scx_bpfland: consider all tasks as regular if priority DSQ is congested This allows to prevent excessive starvation of regular tasks in presence of high amount of interactive tasks (e.g., when running stress tests, such as hackbench). Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	8a655d94f5	scx_bpfland: do not overly prioritize WAKE_SYNC tasks This can lead to stalls when a high number of interactive tasks are running in the system (i.e.., hackbench or similar stress tests). Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	f0c8de3477	scx_bpfland: do not exclude exiting tasks Add SCX_OPS_ENQ_EXITING to the scheduler flags, since we are not using bpf_task_from_pid() and the scheduler can handle exiting tasks. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
Andrea Righi	eb99e45ced	scx_bpfland: consistent vruntime update Ensure that task vruntime is always updated in ops.running() to maintain consistency with other schedulers. Signed-off-by: Andrea Righi <arighi@nvidia.com>	2024-11-05 16:37:51 +01:00
I Hsin Cheng	a5fd42a719	scx_rusty: Fix filtering logic error Fix task filtering logic error to avoid the possibility of migrate the same task over again. The orginal logic operation was "\|\|" which might include tasks already migrated to be taken into consideration again. Change the condition to "&&" so we can elimate the error. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-11-05 14:46:40 +08:00
I Hsin Cheng	a248fdc5e3	scx_rusty: Restore push domain tasks when no task found Inside the function "try_find_move_task()", it returns directly when there's no task found to be moved. If the cause is from lack of ability to fulfilled the condition by "task_filter()", load balancer will try to find move task again and remove "task_filter()" by setting it directly to a function returns true. However, in the fallback case, the tasks within the domains will be empty. Swap the tasks back into domains vector before returning can solve the issue. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-11-05 14:25:29 +08:00
Changwoo Min	e10a92f392	Merge pull request #887 from multics69/scx-minor-fixes scx_lavd: entirely drop kernel lock tracing	2024-11-05 00:19:02 +00:00
Emil Tsalapatis	9c6ad33fda	Merge pull request #891 from etsal/global_costc scx_layered: point costc to global struct when initializing budgets	2024-11-04 21:41:47 +00:00
Emil Tsalapatis	9cf137be99	scx_layered: remove ->open from layer struct	2024-11-04 11:42:23 -08:00
Emil Tsalapatis	2b0909f9d7	scx_layered: point costc to global struct when initializing budgets	2024-11-04 11:40:07 -08:00
Daniel Hodges	5dcdcfc50f	scx_layered: Refactor cost naming Refactor cost struct usage so that it always is used as `costc` for clarity. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-04 07:07:13 -08:00
Changwoo Min	517ef89444	scx_lavd: drop all kernel lock tracing The combination of kernel versions and kerenl configs generates different kernel symbols. For example, in an old kernel version, __mutex_lock() is not generated. Also, there is no workaround from the fentry/fexit/kprobe side currently. Let's entirely drop the kernel locking for now and revisit it later. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 21:36:32 +09:00
Changwoo Min	796d324555	scx_lavd: improve readabilty Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 21:18:43 +09:00
Changwoo Min	882212574a	scx_lavd: fix a variable name to kill a warning Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 16:20:53 +09:00
Changwoo Min	e1b880f7c3	scx_lavd: fix CI error for missing kernel symbols Revised the lock tracking code, relying on stable symbols with various kernel configurations. There are two changes: - Entirely drop tracing rt_mutex, which can be on and off with kconfig - Replace mutex_lock() families to __mutex_lock(), which is stable across kernel configs. The downside of such change is it is now possible to trace the lock fast path, so lock tracing is a bit less accurate. But let's live with it for now until a better solution is found. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-11-04 16:05:53 +09:00
Andrea Righi	f6f5481081	Merge branch 'main' into bpfland-minor-cleanups	2024-11-02 15:28:37 +01:00
Daniel Hodges	7e0f2cd3f3	scx_layered Fix trace format Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-01 10:16:14 -07:00
Daniel Hodges	cb5b1961b8	Merge branch 'main' into layered-fallback-stall	2024-11-01 10:24:00 -04:00
Daniel Hodges	0a518e9f9e	scx_layered: Add additional drain to fallback DSQs Fallback DSQs are not accounted with costs. If a layer is saturating the machine it is possible to not consume from the fallback DSQ and stall the task. This introduces and additional consumption from the fallback DSQ when a layer runs out of budget. In addition, tasks that use partial CPU affinities should be placed into the fallback DSQ. This change was tested with stress-ng --cacheline `nproc` for several minutes without causing stalls (which would stall on main). Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-11-01 06:04:55 -07:00

1 2 3 4 5 ...

1122 Commits