scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-12-12 19:47:18 +00:00

Author	SHA1	Message	Date
Changwoo Min	fe5554b83d	scx_lavd: move reset_lock_futex_boost() to ops.running() Resetting reset_lock_futex_boost() at ops.enqueue() is not accurate, so move it to the running. This way, we can prevent the lock holder preemption only when a lock is acquired during ops.runnging() and ops.stopping(). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-30 13:52:52 +09:00
Changwoo Min	0f58a9cd39	scx_lavd: calculate task's latency criticality in the direct dispatch path Even in the direct dispatch path, calculating the task's latency criticality is still necessary since the latency criticality is used for the preemptablity test. This addressed the following GitHub issue: https://github.com/sched-ext/scx/issues/856 Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-30 12:58:06 +09:00
Changwoo Min	5b91a525bb	scx_lavd: kick CPU explicitly at the ops.enqueue() path When the current task is decided to yield, we should explicitly call scx_bpf_kick_cpu(_, SCX_KICK_PREEMPT). Setting the current task's time slice to zero is not sufficient in this because the sched_ext core does not call resched_curr() at the ops.enqueue() path. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-28 17:31:34 +09:00
Changwoo Min	f56b79b19c	scx_lavd: yield for preemption only when a task is ineligible An eligible task is unlikely preemptible. In other words, an ineligible task is more likely preemptible since its greedy ratio penalty in virtual deadline calculation. Hence, we skip the predictability test for an eligible task. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-28 12:45:50 +09:00
Changwoo Min	4f6947736f	scx_lavd: fix uninitialized memory access comp_preemption_info() The previous code accesses uninitialized memory in comp_preemption_info() when called from can_task1_kick_task2() <-try_yield_current_cpu() to test if a task 2 is a lock holder or not. However, task2 is guaranteed not a lock holder in all its callers. So move the lock holder testing to can_cpu1_kick_cpu2(). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-24 16:07:53 +09:00
Changwoo Min	b90ecd7e8f	scx_lavd: proactively kick a CPU at the ops.enqueue() path When a task is enqueued, kick an idle CPU in the chosen scheduling domain. This will reduce temporary stall time of the task by waking up the CPU as early as possible. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-23 21:43:11 +09:00
Changwoo Min	731a7871d7	scx_lavd: change the greedy penalty function We used to give a penalty in latency linearly to the greedy ratio. However, this impacts the greedy ratio too much in determining the virtual deadline, especially among under-utilized tasks (< 100.0%). Now, we treat all under-utilized tasks with the same greedy ratio (= 100.0%). For over-utilized tasks, we give a bit milder penalty to avoid sudden latency spikes. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-23 21:42:55 +09:00
Changwoo Min	9acf950b75	scx_lavd: change how to use the context information for latency criticality Previously, contextual information—such as sync wakeup and kernel task—was incorporated into the final latency criticality value ad hoc by adding a constant. Instead, let's make everything proportional to run time and waker and wakee frequencies by scaling up/down the run time and the frequencies. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-23 21:32:18 +09:00
Changwoo Min	6fb57643fb	scx_lavd: remove the time restriction in preemption Previously, the preemption is allowed only when a task is at the early in its time slice by using LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN. This is not necessary any more because the lock holder preemption can avoid harmful preemptions. So we remove LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN and unleash the preemption. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	07ed821511	scx_lavd: incorporate task's weight to latency criticality When calculating task's latency criticality, incorporate task's weight into runtime, wake_freq, and wait_freq more systematically. It looks nicer and works better under heavy load. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	47dd1b9582	scx_lavd: respect a chosen cpu even if it is not idle Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	257a3db376	scx_lavd: add ops.cpu_release() When a CPU is released to serve higher priority scheduler class, requeue the tasks in a local DSQ to the global enqueue. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:48:56 +09:00
Changwoo Min	89749ecad7	scx_lavd: fix/work around a verifier error Without this, the BPF verifier spits the following errors with some version of vmlinux.h. So added +1 to work around the problem. --------------- ; bpf_for(j, 0, 64) { @ main.bpf.c:1926 509: (bf) r1 = r8 ; R1_w=fp-32 R8_w=fp-32 refs=66,2035 510: (b4) w2 = 0 ; R2_w=0 refs=66,2035 511: (b4) w3 = 64 ; R3_w=64 refs=66,2035 512: (85) call bpf_iter_num_new#104189 ; R0=scalar() fp-32=iter_num(ref_id=2048,state=active,depth=0) refs=66,2035,2048 513: (bf) r1 = r8 ; R1=fp-32 R8=fp-32 refs=66,2035,2048 514: (85) call bpf_iter_num_next#104191 515: R0_w=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) R6=scalar(id=2047,smin=smin32=0,smax=umax=smax32=umax32=7,var_off=(0x0; 0x7)) R7=scalar() R8=fp-32 R9=map_value(map=bpf_bpf.bss,ks=4,vs=4584,off=384,smin=smin32=0,smax=umax=smax32=umax32=3968,var_off=(0x0; 0xf80)) R10=fp0 fp-16=iter_num(ref_id=66,state=active,depth=1) fp-24=iter_num(ref_id=2035,state=active,depth=1) fp-32=iter_num(ref_id=2048,state=active,depth=1) fp-80=scalar(id=1) fp-88=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) fp-96=????0 fp-112=rcu_ptr_bpf_cpumask() fp-120=rcu_ptr_bpf_cpumask() fp-128=rcu_ptr_bpf_cpumask() fp-136=rcu_ptr_bpf_cpumask() refs=66,2035,2048 ; bpf_for(j, 0, 64) { @ main.bpf.c:1926 515: (15) if r0 == 0x0 goto pc+49 ; R0_w=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) refs=66,2035,2048 516: (64) w6 <<= 6 ; R6=scalar(smin=smin32=0,smax=umax=smax32=umax32=448,var_off=(0x0; 0x1c0)) refs=66,2035,2048 517: (61) r8 = (u32 )(r0 +0) ; R0=rdonly_mem(id=2049,ref_obj_id=2048,sz=4) R8_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=66,2035,2048 518: (26) if w8 > 0x3f goto pc+46 ; R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=63,var_off=(0x0; 0x3f)) refs=66,2035,2048 ; if (cpumask & 0x1LLU << j) { @ main.bpf.c:1927 519: (bf) r1 = r7 ; R1_w=scalar(id=2053) R7=scalar(id=2053) refs=66,2035,2048 520: (7f) r1 >>= r8 ; R1_w=scalar() R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=63,var_off=(0x0; 0x3f)) refs=66,2035,2048 521: (57) r1 &= 1 ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1)) refs=66,2035,2048 522: (15) if r1 == 0x0 goto pc+38 ; R1_w=1 refs=66,2035,2048 ; cpu = (i * 64) + j; @ main.bpf.c:1928 523: (4c) w8 \|= w6 ; R6=scalar(smin=smin32=0,smax=umax=smax32=umax32=448,var_off=(0x0; 0x1c0)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) refs=66,2035,2048 ; bpf_cpumask_set_cpu(cpu, cd_cpumask); @ main.bpf.c:1929 524: (bc) w1 = w8 ; R1_w=scalar(id=2054,smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) R8_w=scalar(id=2054,smin=smin32=0,smax=umax=smax32=umax32=511,var_off=(0x0; 0x1ff)) refs=66,2035,2048 525: (79) r2 = (u64 )(r10 -88) ; R2_w=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) R10=fp0 fp-88=map_value(map=.data.LAVD,ks=4,vs=1320,off=40,smin=smin32=0,smax=umax=smax32=umax32=1240,var_off=(0x0; 0x7f8)) refs=66,2035,2048 526: (85) call bpf_cpumask_set_cpu#93595 invalid access to map value, value_size=1320 off=1280 size=48 R2 max value is outside of the allowed memory range processed 24200 insns (limit 1000000) max_states_per_insn 19 total_states 961 peak_states 789 mark_read 44 --------------- Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-22 17:19:37 +09:00
Changwoo Min	5f19fa0bab	scx_lavd: refill time slice once for a lock holder When a task holds a lock, refill its time slice once at the ops.dispatch() path to avoid the lock holder preemption problem. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-21 15:56:51 +09:00
Changwoo Min	5a852dc3d9	scx_lavd: direct dispatch when there is an idle CPU When there is an idle CPU, direct dispatch is performed to reduce scheduling latency. This didn't work well before, but it seems to work well now with other tunings. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-21 15:56:51 +09:00
Changwoo Min	420de70159	scx_lavd: give more penalty to long-running tasks Giving more penalties to a long-running tasks helps to segregate latency-critical tasks, which are usually short-running, to long-running tasks, which are compute-intensive. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-21 15:56:41 +09:00
Changwoo Min	2fd395bbbf	scx_lavd: remove unnecessary load tracking The algorithm has been evolved to decide the time slice without tracking the system-wide load. So remove the obsolete load tracking code. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-19 15:39:24 +09:00
Changwoo Min	8d63024be7	scx_lavd: add missing reset_lock_futex_boost() reset_lock_futex_boost() should be called every context switch of a task. Otherwise, in the worst case, a task and that CPU could block the preemption. To avoid such a situation, add missing reset_lock_futex_boost() calls. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-19 15:39:18 +09:00
Changwoo Min	c1f4051a14	scx_lavd: fix int overflow in calculating avg_lat_cri u32 is not big enough to hold the sum of lat_cri in a period, so sum_lat_cri (u32) was overflown, resulting in incorrect avg_lat_cri. Change the type from u32 to u64, avoiding the interger overflow. Note that {sum/avg}_lat_cri is only for deubugging so it is irrelevant in making scheduling decisions. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:58:36 +09:00
Changwoo Min	6c9bbe66dc	scx_lavd: remove unnecessary downscaling in deadline calculation The downscaling is not necessary in calculating task's virtual deadline because virtual dealine represents only relative order in task scheduling. Hence downscaling incurs only inacuracy caused by truncation. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-13 00:41:23 +09:00
Changwoo Min	6ddc3f0a2b	scx_lavd: do not inspect scx_lavd process itself Print the task status of scx_lavd is not useful, so filter it out. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-12 17:21:08 +09:00
Changwoo Min	648c95be9e	scx_lavd: fix incorrect task comparison for preemption Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 21:53:24 +09:00
Changwoo Min	5b4b255cbb	scx_lavd: do not preempt while holding a lock When a task holds a lock, it should not yield its time slice or it should not be preempted out. In this way, we can mitigate harmful preemption of lock holders and reduce the total preemption counts. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:49:09 +09:00
Changwoo Min	bd17589a6e	scx_lavd: boost latency criticality when a task holds a lock When a lock holder exhausts its time slide, it will be re-enqueued to a DSQ waiting for shceduling while holding a lock. In this case, prioritize its latency criticality proportionally, so a lock holder would be not stuck in a DSQ for a long time, improving system-wide progress. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 18:48:56 +09:00
Changwoo Min	77b8e65571	scx_lavd: tracing all blocking locks and futexes Trace the acquisition and release of blocking locks for kernel and fuxtexes for user-space. This is necessary to boost a lock holder task in terms of latency and time slice. We do not boost shared lock holders (e.g., read lock in rw_semaphore) since the kernel already prioritizes the readers over writers. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-11 17:03:48 +09:00
Changwoo Min	7c5c83a3a2	scx_lavd: split main.bpf.c into multiple files As the main.bpf.c file grows, it gets hard to maintain. So, split it into multiple logical files. There is no functional change. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-05 00:25:40 +09:00
Changwoo Min	b1070449b2	Merge pull request #714 from multics69/lavd-hotplug scx_lavd: support CPU hotplug correctly	2024-10-03 07:35:25 +09:00
Tejun Heo	7402895f4a	version: v1.0.5	2024-10-02 08:34:57 -10:00
I Hsin Cheng	7fbef2aa0b	scx_lavd: Fix typo Fix "alreay" to "already". Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-10-02 17:39:10 +08:00
Changwoo Min	770a59f69d	scx_lavd: support CPU hotplug correctly Use scx_utils::NR_CPU_IDS to iterate whole CPUs and separately count the number of online CPUs to support CPU hotplug correctly. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-02 14:19:18 +09:00
Changwoo Min	fb7bc0a850	scx_lavd: fix incorrect preemtability test Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-10-02 13:24:43 +09:00
Ming Yang	445743487a	Add #stat_doc attribute macro to Stats struct `#stat_doc` extends the document from stat desc property. Add this attribute macro to the remaining Stats structs. Signed-off-by: Ming Yang <minos.future@gmail.com>	2024-09-30 22:12:11 -07:00
Changwoo Min	ade6931bfc	scx_lavd: fix incorrect neighbor_bit initialization Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-29 02:27:40 +09:00
Changwoo Min	6d116208c8	scx_lavd: do not perform the victim selection for an invalid cpu When finding a victim candidate for preemption, a randomly chosen candidate could be out of valid CPU range due to CPU offline, etc. In this case, try another CPU randomly. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-29 02:22:18 +09:00
Changwoo Min	cd7846f4d2	scx_lavd: more accurately determine the performance criticality threshold We used the average performance criticality of tasks as a threshold to determine the proper core type (big or little). However, if the big core's compute capacity is not half of the total compute capacity, such an average-based determination becomes suboptimal. If fewer tasks are classified as performance-critical tasks and requested to run on big cores, the big cores would be wasted by stealing arbitrary non-performance-critical tasks. That could result in performance instability. Hence, determine the threshold more accurately by considering (active) big cores' compute capacity and the (approximated) distribution of performance criticality of tasks. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-27 16:56:30 +09:00
Changwoo Min	f07023e42b	scx_lavd: rename avg_perf_cri to thr_perf_cri As a preparation to improve the performance criticality logic, we first rename "avg_perf_cri" to "thr_perf_cri" since average is no longer the threshold. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-27 13:12:20 +09:00
I Hsin Cheng	61cb3f7fc5	scx_common_bpf: Append cast_mask() Remove cast_mask() function distributed throughout different schedulers and add it in common.bpf.h so every scheduler can reference it once they need to. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-09-24 16:01:19 +08:00
Changwoo Min	71fa92cf1c	scx_lavd: propagate waker's latency criticality to its wakee If a waker is more latency critical than a wakee, inherit a waker's latency criticality for the wakee. This allows the wakee to consider the context of who wakes me up. For now, we limit such inheritance to one hop and one schedule. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-23 12:56:16 +09:00
Changwoo Min	7321a89724	scx_lavd: find a victim cpu for preemption within task's compute domain Previously, we found a victim from the entire CPUs, which include remote or non-compatible CPUs. Now we limit our search for victim finding within a task's compute domain. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 12:47:18 +09:00
Changwoo Min	8d8d8f9f61	scx_lavd: consider waker's CPU when ops.select_cpu() In case of sync wake-up, consider waker's CPU also to improve cache locality. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-22 01:57:49 +09:00
Changwoo Min	95e2f4dabe	scx_lavd: boost the latency critility of kernel threads Many kernel threads performs latency critical tasks (e.g., net, gpu). In particular, AMD GPU driver runs the most part in the kernel space using kworker. Hence, treat kernel threads as if a woken up task. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-14 00:41:02 +09:00
Changwoo Min	4b4f42fce1	scx_lavd: add a short circuit for the case of no turbo core Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-13 16:02:07 +09:00
Jake Hillion	8ca45cfa37	lint: enable cargo fmt (#643 ) Use `cargo fmt` with a specific nightly branch in the CI to enforce formatting. Globally format these files while the diff is still small so we can stay on top of it. Test plan: - CI lint check passes.	2024-09-11 10:03:20 +01:00
patso	c1df85914b	remove dependency on rlimit.rs the rlimit crate is the only dependency crate with a build.rs. build.rs files complicate portability. this removes the need for rlimit.rs	2024-09-10 01:16:53 -04:00
Changwoo Min	17e0e08e6e	Merge pull request #621 from multics69/lavd-greedy-fix scx_lavd: improve greedy ratio calculation and more	2024-09-07 10:52:00 +09:00
Changwoo Min	36df970a8f	scx_lavd: add debug print for turbo cores Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 19:23:17 +09:00
Changwoo Min	351a1c6656	scx_lavd: enable autopilot mode by default Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 19:23:12 +09:00
Changwoo Min	ebe9375b6a	scx_lavd: pretty printing of status Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 16:27:20 +09:00
Changwoo Min	461cb9a3a0	scx_lavd: fix calculation of greedy_ratio The service time (taskc->svc_time) should be the sum of total CPU time consumed not jut a delta. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-06 16:22:40 +09:00
Tejun Heo	46fc2e1a49	version: v1.0.4	2024-09-05 18:12:45 -10:00

1 2 3 4 5 ...

298 Commits