JakeHillion/scx

mirror of https://github.com/JakeHillion/scx.git synced 2024-11-30 04:50:24 +00:00

Author	SHA1	Message	Date
David Vernet	d42bae4fcf	rusty: Print build ID when rusty is loaded When someone is testing schedulers, we often have to ask what version the scheduler is running as. Now that we can access the build ID from rust schedulers, let's update scx_rusty to print the build ID when rusty first starts running. This results in output such as the following: ``` [void@maniforge scx]$ rusty 19:04:26 [INFO] Running scx_rusty (build ID: 0.8.1-g2043d2537f37c8d75753bb65eb75bca965067564 x86_64-unknown-linux-gnu/debug) 19:04:26 [INFO] NUMA[00] mask= 0b11111111111111111111111111111111 19:04:26 [INFO] DOM[00] mask= 0b00000000111111110000000011111111 19:04:26 [INFO] DOM[01] mask= 0b11111111000000001111111100000000 19:04:26 [INFO] Rusty scheduler started! ``` Signed-off-by: David Vernet <void@manifault.com>	2024-06-25 11:44:46 -05:00
David Vernet	9d9ece11aa	Merge pull request #384 from jfernandez/log-recorder scx_utils: Add log_recorder module for metrics-rs	2024-06-25 11:43:37 -05:00
Changwoo Min	5d0db5c5fe	scx_lavd: revising tunables to reduce micro-stutters This is a second attempt to optimize tunables for a wider range of games. 1) LAVD_BOOST_RANGE increased from 14 (35%) to 40 (100% of nice range). Now the latency priority (biased by nice value) will decide which task should run first . The nice value will decide the time slice. 2) The first change will give higher priority to latency-critical task compared to before. For compensation, the slice boost also increased (2x -> 3x). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-25 16:13:32 +09:00
Jose Fernandez	e5984ed016	scx_utils: Add log_recorder module for metrics-rs This change adds a new module to the scx_utils crate that provides a log recorder for metrics-rs. The log recorder will log all metrics to the console at a configurable interval in an easy to read format. Each metric type will be displayed in a separate section. Indentation will be used to show the hierarchy of the metrics. This results in a more verbose output, but it is easier to read and understand. scx_rusty was updated to use the log recorder and all explicit metric logging was removed. Counters will show the total count and the rate of change per second. Counters with an additional label, like `type` in `dispatched_tasks_total` in rusty, will show the count, rate, and percentage of the total count. Counters: dispatched_tasks_total: 65559 [1344.8/s] prev_idle: 44963 (68.6%) [966.5/s] wsync_prev_idle: 15696 (23.9%) [317.3/s] direct_dispatch: 2833 (4.3%) [35.3/s] dsq: 1804 (2.8%) [21.3/s] wsync: 262 (0.4%) [4.3/s] direct_greedy: 1 (0.0%) [0.0/s] pinned: 0 (0.0%) [0.0/s] greedy_idle: 0 (0.0%) [0.0/s] greedy_xnuma: 0 (0.0%) [0.0/s] direct_greedy_far: 0 (0.0%) [0.0/s] greedy_local: 0 (0.0%) [0.0/s] dl_clamped_total: 1290 [20.3/s] dl_preset_total: 514 [1.0/s] kick_greedy_total: 6 [0.3/s] lb_data_errors_total: 0 [0.0/s] load_balance_total: 0 [0.0/s] repatriate_total: 0 [0.0/s] task_errors_total: 0 [0.0/s] Gauges will show the last set value: Gauges: slice_length_us: 20000.00 Histograms will show the average, min, and max. The histogram will be reset after each log interval to avoid memory leaks, since the data structure that holds the samples is unbounded. Histograms: cpu_busy_pct: avg=1.66 min=1.16 max=2.16 load_avg node=0: avg=0.31 min=0.23 max=0.39 load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39 processing_duration_us: avg=297.50 min=296.00 max=299.00 Signed-off-by: Jose Fernandez <josef@netflix.com>	2024-06-24 18:45:02 -06:00
David Vernet	8059acb634	Merge pull request #381 from vax-r/rusty_dom_load_status_check scx_rusty: Pull domain status check	2024-06-24 17:54:54 -05:00
David Vernet	55ee210d42	Merge pull request #382 from vax-r/rusty_refactor scx_rusty: Refactor ridx assignment in populate_tasks_by_load	2024-06-24 17:47:55 -05:00
Changwoo Min	016229cbcf	scx_lavd: revising tunables for less-preemptive games In some games (e.g., Elden Ring), it was observed that preemption happens much less frequently. The reason is that tasks' runtime per schedule is similar, so it does not meet the existing criteria. To alleviate the problem, the following three tunables are revised: 1) Smaller LAVD_PREEMPT_KICK_MARGIN and LAVD_PREEMPT_TICK_MARGIN help to trigger more preemption. 2) Smaller LAVD_SLICE_MAX_NS works better especially 250 or 300Hz kernels. 3) Longer LAVD_ELIGIBLE_TIME_MAX purturbes time lines less frequently. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-24 00:27:33 +09:00
I Hsin Cheng	eab234a74f	scx_rusty: Refactor ridx assignment in populate_tasks_by_load Origin assignment of the variable ridx is equivalent to comparing between "ridx" and "wids - MAX_PIDS". Using u64 max library helper function to perform the comparison and provide better readability. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-06-23 21:58:51 +08:00
I Hsin Cheng	84b9ac4dce	scx_rusty: Pull domain status check Check whether the BalanceState of pull_dom.load inside function try_find_move_task is actually the variant NeedsPull. It'll perform task migration in abit more conservative manner when the system is under high loading situation. Experiments are performed when the system is compiling linux kernel and undergoing a large amount of I/O operation at the same time using fio. The result showns that before the modification, there're 12,6617 times of task migrations system wide. After the modification, there're 11,5419 times of task migrations system wide. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-06-23 21:38:23 +08:00
David Vernet	5038f54701	Merge pull request #377 from jfernandez/metrics-rs rusty: Integrate stats with the metrics framework	2024-06-21 15:23:20 -05:00
David Vernet	3bd15be840	rlfifo: Use topo.nr_cpu_ids() instead of topo.nr_cpus_possible() In scx_rlfifo, we're currently using topo.nr_cpus_possible() to determine how many possible CPU IDs we could have on the system. To properly support systems whose disabled CPUs may be in the middle of the range of possible CPU IDs, let's instead use topo.nr_cpu_ids() so that we don't accidentally dispatch to an invalid DSQ. Signed-off-by: David Vernet <void@manifault.com>	2024-06-21 12:57:20 -05:00
David Vernet	263e02f644	rusty: Use nr_cpu_ids instead of nr_cpus_possible In scx_rusty, we're currently using topo.nr_cpus_possible() to determine how many possible CPU IDs we could have on the system. scx_rusty already accounts for offlined CPUs, so to properly support systems whose disabled CPUs may be in the middle of the range of possible CPU IDs, let's instead use topo.nr_cpu_ids(). Signed-off-by: David Vernet <void@manifault.com>	2024-06-21 12:57:19 -05:00
David Vernet	bdbf4b9c05	topo: Return nr_cpu_ids from host Topology In some cases, a host may have an odd topology where there are gaps in CPU IDs (including between possible CPUs). A common pattern in schedulers is to perform allocations for every possible CPU ID, such as creating a per-cpu DSQ. In order to avoid confusing schedulers, let's track the maximum CPU ID on a system so that we can return the number of CPU IDs on the system which is inclusive of gaps. We also update scx_rustland in this change to accommodate the fact that we no longer export nr_cpus_possible() from TopologyMap. Signed-off-by: David Vernet <void@manifault.com>	2024-06-21 12:57:13 -05:00
Jose Fernandez	83373b1f4e	rusty: Integrate stats with the metrics framework We need a layer of indirection between the stats collection and their output destinations. Currently, stats are only printed to stdout. Our goal is to integrate with various telemetry systems such as Prometheus, StatsD, and custom metric backends like those used by Meta and Netflix. Importantly, adding a new backend should not require changes to the existing stats code. This patch introduces the `metrics` [1] crate, which provides a framework for defining metrics and publishing them to different backends. The initial implementation includes the `dispatched_tasks_count` metric, tagged with `type`. This metric increments every time a task is dispatched, emitting the raw count instead of a percentage. A monotonic counter is the most suitable metric type for this use case, as percentages can be calculated at query time if needed. Existing logged metrics continue to print percentages and remain unchanged. A new flag, `--enable-prometheus`, has been added. When enabled, it starts a Prometheus endpoint on port 9000 (default is false). This endpoint allows metrics to be charted in Prometheus or Grafana dashboards. Future changes will migrate additional stats to this framework and add support for other backends. [1] https://metrics.rs/ Signed-off-by: Jose Fernandez <josef@netflix.com>	2024-06-21 10:18:44 -06:00
Changwoo Min	9c21ace276	Merge pull request #373 from vax-r/lavd_reuse scx_lavd: Reuse can_task1_kick_task2	2024-06-19 15:29:05 +09:00
I Hsin Cheng	99960ad960	scx_lavd: Reuse can_task1_kick_task2 Use the function can_task1_kick_task2() to replace places which also checking the comp_preemption_info between two cpus for better consistency. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-06-19 11:01:31 +08:00
Changwoo Min	691869e83f	Merge pull request #369 from sched-ext/lavd-fix-pick-cpu scx_lavd: properly check for idle CPUs in pick_cpu()	2024-06-19 09:23:17 +09:00
Changwoo Min	dad25f1b5d	Merge pull request #368 from multics69/lavd-perf-misc scx_lavd: misc performance tuning and code clean up	2024-06-19 07:26:52 +09:00
Andrea Righi	bad9ed13ef	scx_lavd: properly check for idle CPUs in pick_cpu() It seems that we are not updating `is_idle` when we find an idle CPU with pick_cpu(), causing unnecessary rescheduling events when select_cpu() is called. To resolve this, ensure that the is_idle state is correctly set. Additionally, always ensure that the task is dispatched to the local DSQ immediately upon finding (and reserving) an idle CPU. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-18 17:36:39 +02:00
Changwoo Min	632fa9e4f2	scx_lavd: misc code clean up - clean up u63 and u32 usages in structures to reduce struct size - refactoring pick_cpu() for readability Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-18 18:11:49 +09:00
Changwoo Min	5165bf5a03	scx_lavd: tuning CPU frequency scaling The required CPU performance (cpuperf) was set to 1024 (100%) when the CPU utilization was 100%. When a sudden load spike happens, it makes the system adapt slowly in the next interval. The new scheme always reserves some headroom in advance, so it sets cpuperf to 1024 when the CPU utilization reaches to 85%. This gives some room to adapt in advance. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-18 18:11:49 +09:00
I Hsin Cheng	94e3616c02	scx_rusty: Refactor lookup operation for new_domc in task_set_domain Modify the execution sequence before lookup operation for new_domc. If new_dom_id == NO_DOM_FOUND, lookup operation for new_domc is definitely going to fail so we don't have to wait until we found that new_domc is NULL, clearing of cpumask and return operation should be done directly in that case. Plus we should avoid using try_lookup_dom_ctx outside the context of lookup_dom_ctx, as it can keep the interface's consistency. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-06-18 12:58:17 +08:00
David Vernet	0184444285	Merge pull request #366 from sched-ext/task_set_domain_global rusty: Make dom_xfer_task() a global prog	2024-06-17 14:43:45 -05:00
David Vernet	dfe0ffb312	Merge pull request #347 from sched-ext/rusty_cleanup rusty: Clean up some logic in rusty	2024-06-17 14:26:53 -05:00
David Vernet	7985ee556e	rusty: Clean up dispatch logic The rusty dispatch logic is a bit unnecessarily convoluted. Let's clean it up so that we're just comparing dom ids rather than iterating over arrays nested inside of pcpu context. Signed-off-by: David Vernet <void@manifault.com>	2024-06-17 14:24:30 -05:00
David Vernet	87aa86845d	rusty: Refactor + slightly improve wake_sync Right now, the SCX_WAKE_SYNC logic in rusty is very primitive. We only check to see if the waker CPU's runqueue is empty, and then migrate the wakee there if so. We'll want to expand this to be more thorough, such as: - Checking to see if prev_cpu and waker_cpu share the same LLC when determining where to migrate - Check for whether SCX_WAKE_SYNC migration helps load imbalance between cores - ... Right now all of that code is just a big blob in the middle of rusty_select_cpu(). Let's pull it into its own function to improve readability, and also add some logic to stay on prev_cpu if it shares an LLC with the waker. Signed-off-by: David Vernet <void@manifault.com>	2024-06-17 14:24:29 -05:00
David Vernet	fed66fa571	rusty: Make dom_xfer_task() a global prog It seems that task_set_domain() is nearly at the point where it can cause the verifier to get confused and think that it's exceeding the number of available instructions per program. I've seen this a number of times when making small changes to task_set_domain(), and it's once again happened @vax-r (I-Hsin Cheng) made a small cleanup change to rusty in https://github.com/sched-ext/scx/pull/362. To avoid this, let's just make dom_xfer_task() a separate global program so that the verifier doens't have to worry about branch pruning, etc depending on what the caller does. This should hopefully make task_set_domain() (and its callers) much less brittle. Signed-off-by: David Vernet <void@manifault.com>	2024-06-17 14:22:26 -05:00
Tejun Heo	dde2942125	compat: Drop __COMPAT_scx_bpf_cpuperf_() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_(). The open helper macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.	2024-06-16 06:16:53 -10:00
Tejun Heo	13e8388e1e	compat: Drop __COMPAT_HAS_CPUMASKS In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros now check the existence of scx_bpf_nr_cpu_ids() and abort if not.	2024-06-16 06:12:06 -10:00
Tejun Heo	5b5e5be906	compat: Drop __COMPAT_SCX_KICK_IDLE In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros now check the existence of SCX_KICK_IDLE and abort if not.	2024-06-15 20:24:15 -10:00
Tejun Heo	7c9aedaefe	compat: Drop __COMPAT_scx_bpf_switch_all() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.	2024-06-15 20:03:37 -10:00
Tejun Heo	dd6255a601	Merge pull request #359 from sched-ext/htejun/cosmetic common.bpf.h: Cosmetic changes	2024-06-15 06:42:00 -10:00
Andrea Righi	cb20a6f136	scx_rlfifo: dispatch all tasks on the first CPU available With commit `786ec0c0` ("scx_rlfifo: schedule all tasks in user-space") all the scheduling decisions are now happening in user-space. This also bypasses the built-in idle selection logic, delegating the CPU selection for each task to the user-space scheduler. The easiest way to distribute tasks across the available CPUs is to simply allow to dispatch them on the first CPU available. In this way the scheduler becomes usable in practical scenarios and at the same time it also maintains its simplicity. This allows to spread all tasks across all the available CPUs Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-15 16:13:53 +02:00
Andrea Righi	786ec0c04a	scx_rlfifo: schedule all tasks in user-space Disable all the BPF optimization shortcuts by default and force all tasks to be processed by the user-space scheduler. Given that the primary goal of this scheduler is to offer a straightforward and intuitive example for experimental purposes, this change simplifies the process for individuals looking to experiment, allowing them to apply changes to user-space code and quickly observe the effects, without dealing with any in-kernel optimizations. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-15 16:07:39 +02:00
Andrea Righi	59f47d6659	scx_rlfifo: improve code readability No functional change, just add some comments to better describe the parameters used when initializing the main BpfScheduler object. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-15 16:05:28 +02:00
Tejun Heo	d7677e3e5c	scx/common.bpf.h: Rename bpf_log2[l]() to u32/64_log2() The bpf_ prefix is used for BPF API. Rename bpf_log2() to u32_log2() and bpf_log2l() to u64_log2(). While at it, relocate them below compiler directive helpers.	2024-06-14 15:22:39 -10:00
Andrea Righi	8c6fe540eb	scx_rustland: prevent excessive starvation when system is congested Keep track of the maximum vruntime among all tasks and flush them if the difference between the maximum and minimum vruntime exceeds slice_ns. This helps to prevent excessive starvation, as every task is guaranteed to be dispatched within the slice_ns time limit. Tested-by: Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-14 20:09:19 +02:00
Changwoo Min	94a39f419f	scx_lavd: add the design of core compaction The core compaction seems to work great in various hardware. Now it is time to document its design. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-14 11:53:52 +09:00
Changwoo Min	5068d75bf3	Merge pull request #351 from multics69/lavd-power-v2 scx_lavd: improve CPU frequency scaling	2024-06-14 09:29:10 +09:00
Tejun Heo	a3342810c7	Merge pull request #352 from dschatzberg/mitosis common: Add css iter forward declares	2024-06-13 06:50:06 -10:00
Dan Schatzberg	114e4b644b	common: Add css iter forward declares These are used in mitosis, but they belong in common code so other schedulers can do css iteration. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-06-12 15:02:48 -07:00
Changwoo Min	747bf2a7d7	scx_lavd: add the design of CPU frequency scaling Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-13 01:42:19 +09:00
Changwoo Min	2e74b86b4a	scx_lavd: logging cpu performance target Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-13 00:44:04 +09:00
Changwoo Min	e6348a11e9	scx_lavd: improve frequency scaling logic The old logic for CPU frequency scaling is that the task's CPU performance target (i.e., target CPU frequency) is checked every tick interval and updated immediately. Indeed, it samples and updates a performance target every tick interval. Ultimately, it fluctuates CPU frequency every tick interval, resulting in less steady performance. Now, we take a different strategy. The key idea is to increase the frequency as soon as possible when a task starts running for quick adoption to load spikes. However, if necessary, it decreases gradually every tick interval to avoid frequency fluctuations. In my testing, it shows more stable performance in many workloads (games, compilation). Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-12 23:40:40 +09:00
Changwoo Min	753f333c09	scx_lavd: refactoring do_update_sys_stat() Originally, do_update_sys_stat() simply calculated the system-wide CPU utilization. Over time, it has evolved to collect all kinds of system-wide, periodic statistics for decision-making, so it has become bulky. Now, it is time to refactor it for readability. This commit does not contain functional changes other than refactoring. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-12 21:15:25 +09:00
Changwoo Min	9d129f0afa	scx_lavd: rename LAVD_CPU_UTIL_INTERVAL_NS to LAVD_SYS_STAT_INTERVAL_NS The periodic CPU utilization routine does a lot of other work now. So we rename LAVD_CPU_UTIL_INTERVAL_NS to LAVD_SYS_STAT_INTERVAL_NS. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-12 20:06:17 +09:00
Changwoo Min	7046b47b9c	scx_lavd: properly calculate task's runtime after suspend/resume When a device is suspended and resumed, the suspended duration is added up to a task's runtime if the task was running on the CPU. After the resume, the task's runtime is incorrectly long and the scheduler starts to recognize the system is under heavy load. To avoid such problem, the suspended duration is measured and substracted from the task's runtime. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-06-12 15:58:41 +09:00
Dan Schatzberg	b95cfb0772	mitosis: Fix build The target wasn't dependent on the previous sched so building all schedulers ended up not building scx_mitosis which broke the install script.	2024-06-11 14:33:32 -07:00
Dan Schatzberg	9528d4603e	Merge pull request #339 from dschatzberg/mitosis scheds: Add scx_mitosis scheduler	2024-06-11 16:50:25 -04:00
Dan Schatzberg	3b6e2dee20	scheds: Add scx_mitosis scheduler scx_mitosis is a dynamic affinity scheduler which assigns cgroups to Cells and Cells to discrete sets of CPUs. The number of cells is dynamic as is the CPU assignment. BPF mostly just does vtime scheduling for each cell, tracks load, and responds to reconfiguration from userspace. Userspace makes decisions about how to assign cgroups to cells and cells to cpus. This is not yet a complete scheduler, much of the userspace logic is a placeholder as I experiment with better logic. I also want to add richer scheduling semantics to userspace, e.g. so that cells can do more "soft-affinity" rather than the strict partitioning implemented currently. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>	2024-06-11 10:34:53 -07:00

1 2 3 4 5 ...

405 Commits