scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-28 21:50:23 +00:00

Author	SHA1	Message	Date
Andrea Righi	c4eb3ce7b4	scx_bpfland: introduce dynamic nvcsw threshold Instead of using a static value to classify tasks based on their average amount of voluntary context switches, try to periodically evaluate an optimal threshold, based on a global average of voluntary context switches among of all the running tasks. Tasks with an average amount of voluntary context switches greater than the global average will be classified as interactive. The global average is evaluated as an exponentially weighted moving average (EWMA), as: avg(t) = avg(t - 1) * 0.75 - task_avg(t) * 0.25 This approach is more efficient than iterating through all tasks and it helps to prevent rapid fluctuations that may be caused by bursts of voluntary context switch events. The dynamic nvcsw threshold enables a more precise adjustment of the classification criteria to swiftly respond to global system changes: tasks can be quickly classified as interactive, but if the system experiences too many interactive events, the criteria for maintaining interactive status become stricter. This creates a natural selection process where only the most deserving tasks remain interactive. Additionally, introduce the new option `--nvcsw-max-thresh N`, which allows to extend or restrict the fluctuation range of the global average threshold for voluntary context switches. Tested-by: Piotr Gorski <piotrgorski@cachyos.org> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-18 19:03:25 +02:00
Changwoo Min	78d96a6fb6	scx_lavd: advance clock by reverse proportional to the system load Advancing the clock slower when overloaded gives more opportunities for latency-critical tasks to cut in the run queue. Controlling the clock better reflects the actual load than the prior approach of stretching the time-space when overloaded. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-18 15:53:38 +09:00
Changwoo Min	9bc20f9160	scx_lavd: maintain ineligible runnable tasks separately We now maintain two run queues—an eligible run queue (DSQ) and an ineligible run queue (rbtree)—sorted by the task's virtual deadline. When the eligible run queue is empty, or the ineligible run queue has not been consumed for too long (e.g., 15 msec), a task in the ineligible run queue is moved to the eligible run queue for execution. With these two queues, we have a better admission control. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-17 23:46:11 +09:00
I Hsin Cheng	2525b94af4	scx_rusty: Remove unused variable Remove unused variable "has_preferred_dom". Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-17 20:30:17 +08:00
I Hsin Cheng	bf2f0fbf35	scx_rusty: Remove skip_while in find_first_candidate Followed commit `1c3b563`, move the checking of task.migrated.get() into the vector filter. In this way, we can remove the skip_while() call in find_first_candidate(). Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-17 20:27:12 +08:00
Changwoo Min	55e19ea5df	scx_lavd: do not prioritize a wake-up task in ops.select_cpu() This is a prep for adding an ineligible DSQ. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-17 11:16:02 +09:00
Changwoo Min	c84b73e971	scx_lavd: rename LAVD_GLOBAL_DSQ to LAVD_ELIGIBLE_DSQ This is a prep to add a global ineligible dsq. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-17 10:34:34 +09:00
David Vernet	e4be1ec9fe	Merge pull request #364 from hodgesds/rusty-mempolicy scx_rusty: Add mempolicy checks to rusty	2024-07-16 13:40:37 -05:00
Daniel Hodges	27122a8a00	scx_rusty: refactor mempolicy handling bpf code and load balancing This change refactors some of the helper methods for getting the preferred node for tasks using mempolicy. The load balancing logic in try_find_move_task is updated to allow for a filter, which is used to filter for tasks with a preferred mempolicy. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-07-16 09:40:00 -07:00
Tejun Heo	47ee332196	Merge pull request #434 from hodgesds/cpu-topo-cache-id scx_utils: Add LLC id to CPU	2024-07-16 06:20:17 -10:00
Daniel Hodges	43a263aa75	scx_rusty: Use preferred node mask with balancer Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-07-16 08:11:19 -07:00
Daniel Hodges	bab6e9523c	scx_rusty: Add mempolicy checks to rusty This change makes scx_rusty mempolicy aware. When a process uses set_mempolicy it can change NUMA memory preferences and cause performance issues when tasks are scheduled on remote NUMA nodes. This change modifies task_pick_domain to use the new helper method that returns the preferred node id. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-07-16 08:11:19 -07:00
Changwoo Min	971bb2e024	scx_lavd: pretty formatting for ineligible duration Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-16 23:54:15 +09:00
Changwoo Min	adfbf3934c	scx_lavd: tuning the max ineligible duration Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-16 23:52:23 +09:00
Changwoo Min	eff444516f	scx_lavd: directly measure service time for eligibility enforcement Estimating the service time from run time and frequency is not incorrect. However, it reacts slowly to sudden changes since it relies on the moving average. Hence, we directly measure the service time to enforce fairness. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-16 23:48:26 +09:00
Daniel Hodges	97fb0fa0c8	scx_utils: Add LLC id to CPU Add the LLC id to the Cpu struct, which will make it easier for referencing the LLC when doing operations on a Cpu. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-07-16 07:46:55 -07:00
David Vernet	a4d9bcd453	Merge pull request #425 from vax-r/dom_mask_precheck scx_rusty: Pre-check task domain mask with pull domain mask	2024-07-16 09:30:27 -05:00
I Hsin Cheng	1c3b563caf	scx_rusty: Pre-check task domain mask with pull domain mask Instead of performing domain mask checking inside "find_first_candidate()" every time, check whether the tasks within push domain are abled to run on pull domain by performing the mask check at vector generation stage. This way can also avoid repeated computation generated by the same (task, pull_dom) pair as they'll try to check whether the pull domain is in the task domain mask. Also since whether a task is a kworker won't change in time, we can perform the check earlier and put it in the filter, too. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-16 21:48:06 +08:00
Tejun Heo	757cf6ff54	Merge pull request #433 from sched-ext/htejun/bump-versions Bump versions for 1.0.1 release	2024-07-15 13:33:33 -10:00
Tejun Heo	51334b5c4d	Bump versions for 1.0.1 release	2024-07-15 13:21:52 -10:00
Tejun Heo	99afc66251	Merge pull request #432 from jfernandez/fix-sub-counters scx_utils: Display the tag value for nested counters in log_recorder	2024-07-15 12:14:49 -10:00
Jose Fernandez	a63bf77d68	scx_utils: Display the tag value for nested counters in log_recorder The log_recorder was incorrectly displaying the tag name instead of the value for nested counters. Signed-off-by: Jose Fernandez <josef@netflix.com>	2024-07-15 15:20:44 -06:00
Andrea Righi	e268c589ca	Merge pull request #429 from sched-ext/bpfland-small-improvements scx_bpfland: small improvements	2024-07-15 20:08:46 +02:00
Andrea Righi	8e7a526356	scx_bpfland: use nr_cpu_ids for consistency We always use nr_cpu_ids to represent the maximum CPU id returned by scx_bpf_nr_cpu_ids(). Replace cpu_max with nr_cpu_ids to be more consistent with the rest of the code. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 08:44:35 +02:00
Tejun Heo	8aae6917e6	Merge pull request #431 from sched-ext/ci-add-kernel-config ci: Use latest upstream kernel and exclude scx_flatcg and scx_pair from testing	2024-07-14 19:40:55 -10:00
Andrea Righi	f72e7567bf	ci: include kernel config chunk The kernel from kernel.org lacks the necessary .config chunk required by the build via virtme-ng. To fix this, include the kernel .config chunk directly in the scx repository and use it from here. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 07:10:07 +02:00
Tejun Heo	759be0e406	ci: Use latest upstream kernel and exclude scx_flatcg and scx_pair from testing	2024-07-14 13:27:51 -10:00
Tejun Heo	e580ddec3c	Merge pull request #427 from ptr1337/fix-packaging install_user_scheds: Skip packaging of scx_mitosis	2024-07-14 13:19:49 -10:00
Tejun Heo	4141082a26	Merge pull request #428 from ptr1337/systemd systemd: Drop temporarily disabled schedulers from service	2024-07-14 13:16:26 -10:00
Andrea Righi	33d06f653b	scx_bpfland: get rid of the MAX_CPUS hard-coded limit We can rely on scx_bpf_nr_cpu_ids() to create all the possible per-CPU DSQs, eliminating the need for the hard-coded limit MAX_CPUS. In this way scx_bpfland can support the same amount of CPUs that the kernel can handle. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 00:17:30 +02:00
Andrea Righi	b80ef7d8eb	scx_bpfland: optimize offline CPU handling Instead of constantly checking the need to drain tasks from the DSQs of the offline CPUs, provide an atomic flag to notify when there are tasks to be drained from the offline CPUs. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 00:17:23 +02:00
Andrea Righi	0530706710	scx_bpfland: properly initialize the nvcsw metrics Initialize the number of voluntary context switches metrics in the local task storage. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 00:16:10 +02:00
Andrea Righi	bf4ad23599	scx_bpfland: refine interactive tasks flood safeguard Refine the safeguard mechanism to avoid generating too many interactive tasks in the system, which could nullify the effect of the interactive/regular task classification. The safeguard mechanism operates by pausing the promotion of new tasks to interactive status during the task wake-up process, whenever the number of interactive tasks in the priority queue exceeds a specific limit (set to 4x the number of online CPUs). Halting the promotion of additional interactive tasks allows to prioritize those already classified as interactive, thereby preventing potential "bursts" of excessive interactive tasks in the system. This refines the mitigation already provided by commit `640bd562` ("scx_bpfland: prevent tasks from abusing interactive priority boost"). Fixes: `640bd562` ("scx_bpfland: prevent tasks from abusing interactive priority boost") Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-15 00:11:34 +02:00
Andrea Righi	eb1cf0e670	scx_bpfland: improve task time slice evaluation Always assign the maximum time slice if there are idle CPUs in the system. Otherwise, double the task's unused time slice to reward tasks that use less CPU time and at the same time refill the time slice of the tasks every time they're dispatched. Signed-off-by: Andrea Righi <righi.andrea@gmail.com>	2024-07-14 23:24:24 +02:00
Peter Jung	9e2caa74c0	systemd: Drop temporarily disabled schedulers from service Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-07-14 20:26:04 +02:00
Peter Jung	a7fa651bfc	install_user_scheds: Skip packaging of scx_mitosis Signed-off-by: Peter Jung <admin@ptr1337.dev>	2024-07-14 20:21:14 +02:00
Tejun Heo	3ae76acd12	Merge pull request #424 from sched-ext/sync-upstream-kernel-and-bump-to-1.0 Sync to upstream kernel and bump to 1.0	2024-07-14 07:00:38 -10:00
Changwoo Min	5b2112dd81	Merge pull request #421 from multics69/lavd-metrics scx_lavd: improve time slice and waker freq calculation	2024-07-14 18:49:36 +09:00
Tejun Heo	761ec142ce	Bump most versions to 1.0.0 sched_ext is about to be merged upstream. There are some compatibility breaking changes and we're making the current sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues") the baseline. Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early development and is currently temporarily disabled, only the patchlevel is bumped.	2024-07-12 11:34:14 -10:00
Tejun Heo	54c487731a	Update to vmlinux-v6.10-rc2-g1edab907b57d.h Sync to vmlinux.h from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues"). This most likely will be the commit which will be merged during the upcoming kernel v6.11 merge window. Unfortunately, this is a compatibility breaking change. As the size of bpf_iter_scx_dsq is reduced, schedulers that use the iterator - scx_lavd and scx_layered - won't be able to run on older kernels. Likewise, older binaries from before this commit won't be able to run on newer kernels.	2024-07-12 11:13:34 -10:00
Tejun Heo	f261d0f037	Sync from kernel - 1edab907b57d Sync from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues") git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.11 - cgroup support hasn't landed in the upstream kernel yet. This most likely will happen in a few weeks. For the time being, disable scx_flatcg, scx_pair and scx_mitosis. - Compat macro for DSQ task iterator dropped. This is now a part of the baseline. - scx_bpf_consume() isn't upstream yet. BPF interfacing side is still being discussed. Dropped example usage from tools/sched_ext. None of the practical schedulers use it, so this should be fine for now. - scx_bpf_cpu_rq() added. - AUTOATTACH workaround for newer libbpf versions added.	2024-07-12 11:08:41 -10:00
Tejun Heo	228080606c	Update libbpf and bpftool commits Sync to the latest. Right now, it's in an awkward place where receiving AUTOATTACH compat updatees from kernel breaks build as the libbpf version is marked as 1.5 but bpf_map__set_autoattach() is not available yet. Sync to the latest.	2024-07-12 10:51:44 -10:00
Tejun Heo	274bcf7f02	Merge pull request #420 from CachyOS/fix/meson-env-var-append meson: fix RUSTFLAGS being appended incorrectly	2024-07-12 09:06:03 -10:00
Changwoo Min	512bd143a5	scx_lavd: count only related tasks in calculating waker_freq A task can become a runnable on any task's context not only its waker task. Thus, we should not count wake-up on unrelated task's context. With this commit, the scheduler can (much more) accurately detect waker-wakee relationsships. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 22:51:09 +09:00
Changwoo Min	95733f63ab	scx_lavd: calculate time slice as a function of run queue length The prior approach using the sum of weights gives too much penalty to nice tasks with large nice values. With this commit, the time slice is determined by the number of runnable tasks regardless of nice priority. Note that the fairness will still be enforced based on tasks' nice priorities (weights). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 22:45:22 +09:00
Vladislav Nepogodin	c7f1909415	meson: fix RUSTFLAGS being appended incorrectly	2024-07-12 16:42:53 +04:00
Changwoo Min	00fdc1d949	Merge pull request #417 from multics69/lavd-vdeadline scx_lavd: improve virtual deadline and current clock handling	2024-07-12 14:05:44 +09:00
Changwoo Min	d4bc92bea7	scx_lavd: print lat_cri to output Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 13:23:56 +09:00
Changwoo Min	4c5c564523	scx_lavd: initial current logical clock to zero To easily distinguish, let's initialize the current logical clock to zero (not the current physical time). Also, avoid the deadline calculation being zero by adding +1 here and there. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-07-12 10:15:54 +09:00
Andrea Righi	641a8c4c5c	Merge pull request #418 from sched-ext/bpfland-mitigations scx_bpfland: mitigations and additional statistics	2024-07-11 18:32:18 +02:00

1 2 3 4 5 ...

1166 Commits