scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-24 20:00:22 +00:00

Author	SHA1	Message	Date
Andrea Righi	d0fb29a0f7	scx_rustland: aggressively prioritize interactive tasks scx_rustland was originally designed as a PoC to showcase the benefits of implementing specialized schedulers via sched_ext, focusing on a very specific use case: prioritize game responsiveness regardless of what runs in the background. Its original design was subsequently modified to better serve as a general-purpose scheduler, balancing the prioritization of interactive tasks with CPU-intensive ones to prevent over-prioritization. With scx_bpfland serving as a more "general-purpose" scheduler, it makes sense to revisit scx_rustland's original goal and make it much more aggressive at prioritizing interactive tasks, determined in function of their average amount of context switches. This change makes scx_rustland again a really good PoC to showcase the benefits of having specialized schedulers, by focusing only at a very specific use case: provide a high and stable frames-per-second (fps) while a kernel build is running in the background. = Results = - Test: Run a WebGL application [1] while building the kernel (make -j32) - Hardware: 8-cores Intel 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz +----------------------+--------+--------+ \| Scheduler \| avg fps\| stdev \| +----------------------+--------+--------+ \| EEVDF \| 28 \| 4.00 \| \| scx_rustland-before \| 43 \| 1.25 \| \| scx_rustland-after \| 60 \| 0.25 \| +----------------------+--------+--------+ [1] https://webglsamples.org/aquarium/aquarium.html Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-09-02 15:53:35 +02:00
Changwoo Min	172fe1efc6	Merge pull request #597 from multics69/lavd-turbo-tuning2 scx_lavd: misc updates (verifier, README, monitor option name, and micro-optimization)	2024-09-02 18:00:26 +09:00
Changwoo Min	0108b83050	scx_lavd: make the old verifier happy (bpf_cpumask_set_cpu) An old BPF verifier does not allow calling bpf_cpumask_set_cpu() in the BPF syscall context, so we defer actual bpf_cpumask_set_cpu() to the timer handler, update_sys_stat(), to workaround the problem. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-02 18:00:12 +09:00
Changwoo Min	3bc2fd4977	scx_lavd: update README Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-02 18:00:12 +09:00
Changwoo Min	afbebaeed6	scx_lavd: check a core type of previous cpu at pick_idle_cpu() If a task is performance-critical, pick_idle_cpu() checks if the previous core is a big core or not. If not, don't try to run on previous core since a performance-critical task is better to run on a big core. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-01 17:28:16 +09:00
Changwoo Min	f2122c4197	Merge pull request #595 from multics69/lavd-turbo-tuning scx_lavd: improve the autopilot mode	2024-09-01 16:24:41 +09:00
Andrea Righi	1595445a63	Merge pull request #594 from sched-ext/scx-rustland-core-version-2 scx_rustland_core: bump up major version to 2.0.0	2024-09-01 08:57:32 +02:00
Changwoo Min	5ca4501139	scx_lavd: dynamically decide autopilot's low watermark A single threshold for a low watermark does not work well across systems with various numbers of cores and core types. Instead of using a single low watermark value, we dynamically decide the low watermark: 1) until one little core is fully utilized or 2) until two big cores are fully utilized. This works better across systems. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-09-01 12:46:57 +09:00
Andrea Righi	0aa71c832b	scx_rustland_core: bump up major version to 2.0.0 The scx_rustland_core API has been redesigned recently, breaking the compatibility with the past. Considering that Rust crates should update their major version when the previous API becomes incompatible [1], bump up the version to 2.0.0. [1] https://semver.org/ Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-31 23:23:26 +02:00
Andrea Righi	2cbf252019	scx_bpfland: directly dispatch only per-cpu kthreads with local_kthreads We want to directly dispatch only kthreads when local_kthreads is enabled, not all tasks that can run on a single CPU. Fixes: `7cc1846` ("scx_bpfland: always rely on prev_cpu with single-CPU tasks") Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-31 16:35:54 +02:00
Changwoo Min	4a7b806dd2	scx_lavd: when no_freq_scaling, always set to the max freq When the no_freq_scaling changes during runtime in the autopilot mode, the last target freq set would not be 1024. So the performance mode enabled by the autopilot mode would not run in the best profile. Hence, we set the target freq to 1024 always when no_freq_scaling is set. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 18:22:33 +09:00
Daniel Hodges	63a2eecce8	Merge pull request #592 from hodgesds/layered-ts-fixes scx_layered: Fix layer timeslice not being applied	2024-08-30 15:34:57 -04:00
Daniel Hodges	e04b612688	scx_layered: Fix layer timeslice not being applied Fix a small bug where the layer timeslice is not applied. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-30 11:53:42 -07:00
Changwoo Min	4d8bf870a1	Merge pull request #591 from multics69/lavd-turbo3 scx_lavd: introduce "autopilot" mode and misc. optimization & bug fix	2024-08-31 02:14:35 +09:00
Andrea Righi	f782467eaf	scx_rustland: convert to scx_stats This allows scx_rustland to avoid generating excessive logs for statistics while still allowing detailed monitoring on demand. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-30 18:32:32 +02:00
Changwoo Min	9091dd983b	scx_lavd: add "--autopilot" mode Add "--autopilot" option and mode. In the autopilot mode, the scheduler dynamically changes its power mode according to system's load (cpu utilization). When the cpu utilization is low enough (say <=5%), it switches to the powersave mode since there is nothing to process fast so powersaving is the primary goal. When the utilization is moderate (say >5%, <=30%), it runs in balanced mode. When the utilization is high enough (say >30%), it runs in performance mode. Note that it only changes scheduler's power mode but it does not change system's energy profile. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 01:14:33 +09:00
Changwoo Min	5ecaa9ebe2	scx_lavd: improve the accuracy of cpu utilization calculation When a cpu is idle for a whole interval, its idle time does not correctlyh adds up so the utilization of such cpu tends to be higher than the actual utilization. Now it is fixedk, so cpu utilization becomes more accurate. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 01:14:33 +09:00
Changwoo Min	2f8cc0d60f	scx_lavd: rename the "--auto" opetion to "--autopower" to be clear Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 01:14:33 +09:00
Changwoo Min	815f1263b2	scx_lavd: reinitialize active cpumask when power mode changes When the power mode changes back to performance mode, we should active/overflow cpumask to its initial state -- all big cores are in active cpumask and all little cores are in overflow cpumask. Otherwise, the active/overflow cpumasks will be used in the perfformance mode. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 01:14:33 +09:00
Changwoo Min	afb8c78a09	scx_lavd: print power mode change in the auto mode Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 01:14:33 +09:00
Changwoo Min	a89a56dba4	scx_lavd: add a fastpath in ops.select_cpu() for a sharply pinned task If a task can be run only on a single cpu, we don't need to go through all the steps in ops.select_cpu(). Instread, we simply check if a task is still pinned on the prev_cpu and go. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-31 01:14:33 +09:00
Andrea Righi	b54fc202b8	Merge pull request #583 from sched-ext/bpfland-fix-pcpu-direct-dispatch scx_bpfland: always rely on prev_cpu with single-CPU tasks	2024-08-30 18:12:59 +02:00
Andrea Righi	7cc18460b9	scx_bpfland: always rely on prev_cpu with single-CPU tasks When selecting an idle for tasks that can only run on a single CPU, always check if the previously used CPU is sill usable, instead of trying to figure out the single allowed CPU looking at the task's cpumask. Apparently, single-CPU tasks can report a prev_cpu that is not in the allowed cpumask when they rapidly change affinity. This could lead to stalls, because we may end up dispatching the kthread to a per-CPU DSQ that is not compatible with its allowed cpumask. Example: kworker/u32:2[173797] triggered exit kind 1026: runnable task stall (kworker/2:1[70] failed to run for 7.552s) ... R kworker/2:1[70] -7552ms scx_state/flags=3/0x9 dsq_flags=0x1 ops_state/qseq=0/0 sticky/holding_cpu=-1/-1 dsq_id=0x8 dsq_vtime=234483011369 cpus=04 In this case kworker/2 can only run on CPU #2 (cpus=0x4), but it's dispatched to dsq_id=0x8, that can only be consumed by CPU 8 => stall. To prevent this, do not try to figure out the best idle CPU for tasks that are changing affinity and just dispatch them to a global DSQ (either priority or regular, depending on its interactive state). Moreover, introduce an explicit error check in dispatch_direct_cpu() to improve detection of similar issues in the future, and drop lookup_task_ctx() in favor of try_lookup_task_ctx(), since we can now safely handle all the cases where the task context is not found. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-30 09:45:58 +02:00
Changwoo Min	3e2e78a9ec	Merge pull request #584 from multics69/lavd-turbo2 scx_lavd: automatically determine power mode and more	2024-08-30 08:56:16 +09:00
Daniel Hodges	47184e9d19	Merge pull request #582 from hodgesds/layered-growth-interface scx_layered: Add layer growth config	2024-08-29 18:49:59 -04:00
Changwoo Min	bb08919203	scx_lavd: determine power mode automatically with --auto option It checkes the EPP (energy performance preference) peirodically and sets the power profile of the scheduler during runtiime as a user changes its EPP profile (from her desktop UI). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-29 19:15:23 +09:00
Andrea Righi	cc3f696c4b	Merge pull request #577 from sched-ext/bpfland-task-affinity scx_bpfland: enhanced task affinity	2024-08-29 07:46:57 +02:00
Daniel Hodges	7e0329e45c	scx_layered: Add layer growth config Add a per layer config for different implementations of layer growth algorithms. Convert the existing default logic into a default layer growth algorithm and add a linear implementation. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 19:17:24 -07:00
Daniel Hodges	cf765562c7	scx_layered: Update docs for layer slice setting Add docs for layer slice setting. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 22:12:07 -04:00
Daniel Hodges	a23308e7b0	scx_layered: Add more docs on tuning Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 12:38:05 -07:00
Daniel Hodges	96326b1ef3	scx_layered: Add additional docs Add some additional docs on tuning layered. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 12:27:26 -07:00
Daniel Hodges	cc450f1a4b	scx_layered: Add per layer timeslice Allow setting a different timeslice per layer. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 11:21:03 -07:00
Daniel Hodges	c511b42b7b	scx_layered: Make verification easier on older kernels Refactor some BPF code to make verification easier on older kernels. This is to make it easier to maintain backports. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 08:05:10 -07:00
Daniel Hodges	12f8cb74b5	scx_utils: Add GPU topology Add GPU awareness to the topology crate. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-28 06:35:35 -07:00
Andrea Righi	28cb1ec5cb	scx_bpfland: enhanced task affinity Aggressively try to keep tasks running on the same CPU / cache / domain, to achieve higher performance when the system is not over commissioned. This is done by giving a second chance in ops.enqueue(), in addition to ops.select_cpu(), to find an idle CPU close to the previously used CPU. Moreover, even if the task is dispatched to the global DSQs, always try to check if there is an idle CPU in the primary domain that can immediately consume the task. = Results = This change seems to provide a minor, but consistent, boost of performance with the CPU-intensive benchmarks from the CachyOS benchmarks selection [1]. Similar results can also be noticed with some WebGL benchmarks [2], when system usage is close to its maximum capacity. Test: - cachyos-benchmarker System: - AMD Ryzen 7 5800X 8-Core Processor Metrics: - total time: elapsed time of all benchmarks - total score: geometric mean of all benchmarks NOTE: total time is the most relevant, since it gives a measure of the aggregate performance, while the total score emphasizes more on performance consistency across all benchmarks. == Results: summary == +-------------------------+---------------------+---------------------+ \| Scheduler \| Total Time \| Total Score \| \| \| (less = better) \| (less = better) \| +-------------------------+---------------------+---------------------+ \| EEVDF \| 624.44 sec \| 123.68 \| \| bpfland \| 625.34 sec \| 122.21 \| \| bpfland-task-affinity \| 623.67 sec \| 122.27 \| +-------------------------+---------------------+---------------------+ == Conclusion == With this patch applied, bpfland shows both a better performance and consistency. Although the gains are small (less than 1%), they are still significant for this type of benchmark and consistently appear across multiple runs. [1] https://github.com/CachyOS/cachyos-benchmarker [2] https://webglsamples.org/aquarium/aquarium.html Tested-by: Piotr Gorski < piotr.gorski@cachyos.org > Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-28 10:30:54 +02:00
Avraham Hollander	6c5d85401d	Merge branch 'sched-ext:main' into main	2024-08-27 23:07:54 -04:00
Avraham Hollander	2a3cbeb760	scx_lavd: Add same power mode clarification to --no-prefer-turbo-core	2024-08-27 23:06:31 -04:00
Changwoo Min	5588126cff	scx_lavd: minior optimization for consume_task() When iterating neighbors, the existing code unnecessarily iterates all the neighbors to the maximum even if there is no neighors. So the fix escapes early when there is no neighbors. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-28 10:26:50 +09:00
Changwoo Min	95272ae910	scx_lavd: proper handling of ctrl-c in a monitoring mode Ctrl-c wasn't properly handled in the monitoring mode (`--monitor-sched-samples`), so the scheduler could not be terminated by pressing ctrl-c. The missing ctrl-c handling is added to the monitor thread. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-28 10:05:34 +09:00
Changwoo Min	9c4428fd8b	scx_lavd: remove unused rust functions Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-28 10:02:11 +09:00
Andrea Righi	a155d5185d	scx_bpfland: rely on Topology to classify core types Rely on scx_utils::Topology to classify Big, Little and Turbo CPUs. Moreover, support the special keyword "all" with --primary-domain to include all the CPUs in the system (default). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-28 00:23:55 +02:00
Andrea Righi	872e653cd2	scx_utils: introduce Turbo core type to Topology Integrate the logic used by scx_bpfland to detect turbo-boosted cores in Topology. Also change the logic to detect Big/Little cores in function of base_frequency, instead of scaling_max_freq, otherwise turbo-boosted cores in homogeneous systems may be incorrectly classified as Big. Moreover, introduce the following new methods to Cpu to check for the core type: - is_turbo(): return true if the CPU is Turbo, false otherwise - is_big(): return true if the CPU is either Turbo or Big - is_little(): return true if the CPU is Little Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-28 00:09:08 +02:00
Daniel Hodges	41cebb807a	Merge pull request #569 from anh0516/main scx_layered: Clean up in-code documentation; add commas for consistency	2024-08-27 09:47:29 -04:00
Andrea Righi	6768f9f88c	Merge pull request #572 from sched-ext/bpfland-fix-turbo-domain scx_bpfland: fix turbo boost domain nullifying primary domain limits	2024-08-27 15:23:12 +02:00
Andrea Righi	e0f49a338a	scx_bpfland: fix turbo boost domain nullifying primary domain limits When creating the turbo boost scheduling domain, we might use a full CPU mask (selecting all possible CPUs) to indicate "do not prioritize turbo boost CPUs" or when all CPUs have the same maximum frequency. This approach works when the primary domain also contains all the CPUs, as the complete overlap allows the CPU selection logic to ignore the turbo boost domain and start picking CPUs directly from the primary domain. However, if the primary domain doesn't include all CPUs, the two domains won't fully overlap, which can lead to the turbo boost domain incorrectly including all CPUs, thereby negating the restrictions set by the primary scheduling domain. To resolve this, an empty CPU mask should be used for the turbo boost domain when turbo boost CPUs aren't prioritized. If the turbo boost domain is empty, it should be entirely bypassed, and the selection should proceed directly to the primary domain. Reported-by: Changwoo Min <changwoo@igalia.com> Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-27 13:36:50 +02:00
Changwoo Min	00430c3ded	scx_lavd: make a loop easier to correctly verify With an ill combination of old kernel and old LLVM, the BPF verifier incorrectly detects an infinite loop. After changing the loop with a constant end, the old verifier can pass the code. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-08-27 17:11:20 +09:00
Changwoo Min	09cff560aa	Merge pull request #566 from multics69/lavd-turbo scx_lavd: prioritize the turbo boost-able cores	2024-08-27 08:47:25 +09:00
Daniel Hodges	83cd26eb9e	Merge pull request #564 from hodgesds/layered-help scx_layered: Update help for tgid matching	2024-08-26 14:52:53 -04:00
Andrea Righi	35db89e90d	Merge pull request #568 from sched-ext/rustland-core-design-improv scx_rustland_core: small core design improvements	2024-08-26 20:06:21 +02:00
Avraham Hollander	7a43801d76	Add quotes for clarity	2024-08-26 13:20:01 -04:00

1 2 3 4 5 ...

789 Commits