JakeHillion/scx

mirror of https://github.com/JakeHillion/scx.git synced 2024-12-02 05:47:12 +00:00

Author	SHA1	Message	Date
Peter Jung	9e2caa74c0	systemd: Drop temporarily disabled schedulers from service Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-07-14 20:26:04 +02:00
Peter Jung	a7fa651bfc	install_user_scheds: Skip packaging of scx_mitosis Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-07-14 20:21:14 +02:00
Tejun Heo	3ae76acd12	Merge pull request #424 from sched-ext/sync-upstream-kernel-and-bump-to-1.0 Sync to upstream kernel and bump to 1.0	2024-07-14 07:00:38 -10:00
Changwoo Min	5b2112dd81	Merge pull request #421 from multics69/lavd-metrics scx_lavd: improve time slice and waker freq calculation	2024-07-14 18:49:36 +09:00
Tejun Heo	761ec142ce	Bump most versions to 1.0.0 sched_ext is about to be merged upstream. There are some compatibility breaking changes and we're making the current sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues") the baseline. Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early development and is currently temporarily disabled, only the patchlevel is bumped.	2024-07-12 11:34:14 -10:00
Tejun Heo	54c487731a	Update to vmlinux-v6.10-rc2-g1edab907b57d.h Sync to vmlinux.h from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues"). This most likely will be the commit which will be merged during the upcoming kernel v6.11 merge window. Unfortunately, this is a compatibility breaking change. As the size of bpf_iter_scx_dsq is reduced, schedulers that use the iterator - scx_lavd and scx_layered - won't be able to run on older kernels. Likewise, older binaries from before this commit won't be able to run on newer kernels.	2024-07-12 11:13:34 -10:00
Tejun Heo	f261d0f037	Sync from kernel - 1edab907b57d Sync from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues") git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.11 - cgroup support hasn't landed in the upstream kernel yet. This most likely will happen in a few weeks. For the time being, disable scx_flatcg, scx_pair and scx_mitosis. - Compat macro for DSQ task iterator dropped. This is now a part of the baseline. - scx_bpf_consume() isn't upstream yet. BPF interfacing side is still being discussed. Dropped example usage from tools/sched_ext. None of the practical schedulers use it, so this should be fine for now. - scx_bpf_cpu_rq() added. - AUTOATTACH workaround for newer libbpf versions added.	2024-07-12 11:08:41 -10:00
Tejun Heo	228080606c	Update libbpf and bpftool commits Sync to the latest. Right now, it's in an awkward place where receiving AUTOATTACH compat updatees from kernel breaks build as the libbpf version is marked as 1.5 but bpf_map__set_autoattach() is not available yet. Sync to the latest.	2024-07-12 10:51:44 -10:00
Tejun Heo	274bcf7f02	Merge pull request #420 from CachyOS/fix/meson-env-var-append meson: fix RUSTFLAGS being appended incorrectly	2024-07-12 09:06:03 -10:00
Changwoo Min	512bd143a5	scx_lavd: count only related tasks in calculating waker_freq A task can become a runnable on any task's context not only its waker task. Thus, we should not count wake-up on unrelated task's context. With this commit, the scheduler can (much more) accurately detect waker-wakee relationsships. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 22:51:09 +09:00
Changwoo Min	95733f63ab	scx_lavd: calculate time slice as a function of run queue length The prior approach using the sum of weights gives too much penalty to nice tasks with large nice values. With this commit, the time slice is determined by the number of runnable tasks regardless of nice priority. Note that the fairness will still be enforced based on tasks' nice priorities (weights). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 22:45:22 +09:00
Vladislav Nepogodin	c7f1909415	meson: fix RUSTFLAGS being appended incorrectly	2024-07-12 16:42:53 +04:00
Changwoo Min	00fdc1d949	Merge pull request #417 from multics69/lavd-vdeadline scx_lavd: improve virtual deadline and current clock handling	2024-07-12 14:05:44 +09:00
Changwoo Min	d4bc92bea7	scx_lavd: print lat_cri to output Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 13:23:56 +09:00
Changwoo Min	4c5c564523	scx_lavd: initial current logical clock to zero To easily distinguish, let's initialize the current logical clock to zero (not the current physical time). Also, avoid the deadline calculation being zero by adding +1 here and there. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 10:15:54 +09:00
Andrea Righi	641a8c4c5c	Merge pull request #418 from sched-ext/bpfland-mitigations scx_bpfland: mitigations and additional statistics	2024-07-11 18:32:18 +02:00
Andrea Righi	640bd562ff	scx_bpfland: prevent tasks from abusing interactive priority boost The priority boost for interactive tasks can be exploited to render the system nearly unresponsive by creating numerous tasks that constantly switch between wait/wakeup states. For example, stress tests like `hackbench -l 10000` can significantly degrade system responsiveness. To mitigate this, limit the number of interactive tasks added to the priority queue to 4x the number of online CPUs. This simple approach appears to be a quite effective at identifying potential spam of "fake" interactive tasks, while still prioritizing legitimate interactive tasks. Additionally, periodically refresh the interactive status of the tasks based on their most recent average of voluntary context switches, preventing the interactive status from being too "sticky". Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-11 16:13:55 +02:00
Andrea Righi	1babb2b92d	scx_bpfland: prevent per-CPU kthreads starving other tasks Avoid dispatching per-CPU kthreads directly, since this may cause interactivity problems or unfairness, for example if there are too many softirqs being scheduled (e.g., in presence of high RX network traffic or when running certain stress tests, like hackbench). Moreover, in order to help with testing and benchmarks, introduce the option --local-kthread, that allows to restore the old behavior if enabled. Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-11 16:13:48 +02:00
Andrea Righi	c3ebdd338f	scx_bpfland: prevent slice delta overflow When updating the task vruntime, ensure the time slice delta is always a positive value. Failing to do so may cause the global vruntime to increase excessively due to overflows. Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-11 15:58:01 +02:00
Andrea Righi	f59aa52fe7	scx_bpfland: expose the amount of online CPUs Periodically report the amount of online CPUs to stdout. The online CPUs are initially evaluated looking at the online cpumask, then the value is updated in the .cpu_offline() / .cpu_online() callbacks. Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-11 15:58:01 +02:00
Andrea Righi	3a47b484af	scx_bpfland: report interactive tasks to stdout Keep track of the CPUs that are running interactive tasks and report their amount to stdout. Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-11 15:58:01 +02:00
Andrea Righi	1a1a16b9e9	scx_bpfland: fix typo in slice_ns definition The correct default value of slice_ns 5ms, not 5s. This change doesn't really make any difference in practice, since these values are changed by the Rust part when the scheduler is started, but it's good to keep this aligned to the proper values for consistency. Tested-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-11 15:58:01 +02:00
Changwoo Min	bdbfeb9fd1	scx_lavd: use logical current clock for virtual deadlines This commit changes the use of a physical clock to a virtual, logical clock in calculating deadlines. - The virtual current clock advances upon a task's running to its virtual deadline. - When enqueuing a task, its virtual deadline from the virtual current clock is calculated. With the above two changes, this guarantees that there is no such task whose virtual deadline is smaller than the virtual current clock. This means any enqueuing task can compete with any other already enqueued tasks. This allows a latency-critical task to be immediately scheduled if needed. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 22:41:56 +09:00
Changwoo Min	408ea7892c	scx_lavd: induce sched_prio_to_latency_weight from slice weight So sched_prio_to_latency_weight is removed. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 21:37:21 +09:00
Changwoo Min	bd964acff6	scx_lavd: deprioritize a newly forked task in latency Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 21:36:32 +09:00
Changwoo Min	48debe416e	scx_lavd: tuning the deadline equation under high load Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 21:35:54 +09:00
Changwoo Min	c72e063680	scx_lavd: do not use lat_prio_to_greedy_thresholds With other optimizations, lat_prio_to_greedy_thresholds is not effective any more. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 21:35:01 +09:00
Changwoo Min	9ed488798e	scx_lavd: use task's runtime to determine its deaddline It has an effect of further perferring shorter jobs. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 21:34:25 +09:00
Changwoo Min	e081b2a294	scx_lavd: rename LAVD_MAX_CAS_RETRY to LAVD_MAX_RETRY Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-11 21:33:56 +09:00
Andrea Righi	3df7a13117	Merge pull request #416 from sched-ext/bpfland-small-improvements bpfland: small improvements	2024-07-08 23:11:47 +02:00
Andrea Righi	995577762a	scx_bpfland: refill task time slice Every time we need to dispatch a task re-evalate its time slice as: (unused_time_slice + min_time_slice) / 2 This allows to refill the time slice for tasks that haven't used much of their previously assigned time, improving fairness. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-06 14:07:24 +02:00
Andrea Righi	6a64182ef2	scx_bpfland: always classify interactive tasks Make sure to always classify interactive tasks, even when the system is not fully utilized. This ensures that if the system suddenly becomes overloaded, we already know which tasks need to be dispatched to the priority DSQ. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-06 14:07:24 +02:00
Andrea Righi	8dd528abfd	scx_bpfland: pass enqueue flags when dispatching kthreads Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-06 14:07:10 +02:00
Andrea Righi	fc0d1bd003	Merge pull request #415 from sched-ext/bpfland-output scx_bpfland: additional stats and output improvements	2024-07-05 19:50:07 +02:00
Tejun Heo	af5e89e73c	Merge pull request #412 from vax-r/flatcg_delta_fetch scx_flatcg: Make good use of __sync_fetch_and_sub()	2024-07-05 07:39:22 -10:00
Tejun Heo	7f8e4edb53	Merge pull request #397 from jfernandez/log-recorder-customize sched_utils: Add log recorder format customization	2024-07-05 07:37:39 -10:00
Tejun Heo	14d0a0ef64	Merge pull request #411 from vax-r/Fix_typo scx_flatcg: Fix_typo	2024-07-05 07:35:31 -10:00
Andrea Righi	2bc8f800e7	scx_bpfland: report build id version Use the version string provided by scx_utils:build_id. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-05 09:29:29 +02:00
Andrea Righi	bdb31e98e2	scx_bpfland: show statistics in a more human-readable format Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-05 09:29:29 +02:00
Andrea Righi	f9d7844b2e	scx_bpfland: split direct dispatches and kthread dispatches Show separate statistics for direct dispatches and kthread direct dispatches. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-05 09:27:59 +02:00
Andrea Righi	86d2f50230	Merge pull request #410 from sched-ext/bpfland-smooth-perf scx_bpfland: enhance performance consistency and predictability	2024-07-04 21:37:07 +02:00
Andrea Righi	d98516fe75	Merge pull request #413 from vax-r/Remove_unused_variable scx_rustland_core: Remove unused variable	2024-07-04 19:09:27 +02:00
I Hsin Cheng	1595da78dc	scx_rustland_core: Remove unused variable Remove unused variable "tctx" in rustland_select_cpu. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-05 01:04:49 +08:00
I Hsin Cheng	aae826b1b3	scx_flatcg: Make good use of __sync_fetch_and_sub() Fetch the value of "delta" directly from the returned value from __sync_fetch_and_sub, as it returns the origin value of cgc->cvtime_delta. Additional fetching instruction of cgc->cvtime_delta would be redundant here. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-05 01:03:20 +08:00
I Hsin Cheng	3e52761487	scx_flatcg: Fix_typo Fix "oppotunistic" to "opportunistic". Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-04 22:04:40 +08:00
Andrea Righi	cfe2ed063d	scx_bpfland: time-based starvation prevention Tasks are consumed from various DSQs in the following order: per-CPU DSQs => priority DSQ => shared DSQ Tasks in the shared DSQ may be starved by those in the priority DSQ, which in turn may be starved by tasks dispatched to any per-CPU DSQ. To mitigate this, record the timestamp of the last task scheduling event both from the priority DSQ and the shared DSQ. If the starvation threshold is exceeded without consuming a task, the scheduler will be forced to consume a task from the corresponding DSQ. The starvation threshold can be adjusted using the --starvation-thresh command line parameter (default is 5ms). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-04 10:52:39 +02:00
Andrea Righi	9e0db4ae17	scx_bpfland: remove unnecessary RCU read protection There is no need to RCU protect the cpumask for the offline CPUs: it is created once when the scheduler is initialized and it's never deallocated. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-04 10:24:43 +02:00
Andrea Righi	cef6ca93cf	scx_bpfland: adjust default time slice to 5ms Reduce the default time slice down to 5ms for a faster reaction and system responsiveness when the system is overcomissioned. This also helps to provide a more predictable level of performance. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-04 10:24:43 +02:00
Andrea Righi	7d15e3171c	scx_bpfland: ensure task time slice never exceeds the slice_ns limit Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-04 10:24:43 +02:00
Andrea Righi	e8a4d350ad	scx_bpfland: unify dispatching kthreads with direct CPU dispatches Always use direct CPU dispatch for kthreads, there is no need to treat kthreads in a special way, simply reuse direct CPU dispatch to prioritize them. Moreover, change direct CPU dispatches to use scx_bpf_dispatch_vtime(), since we may dispatch multiple tasks to the same per-CPU DSQ now. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-07-03 09:38:43 +02:00

... 3 4 5 6 7 ...

1282 Commits