scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-11-28 21:50:23 +00:00

Author	SHA1	Message	Date
Tejun Heo	03495cd37b	scx_stat: Initial commit scx_stat is a statistics reporting library with the following goals: - Minimal boilerplate code. Statistics are defined as a regular rust struct which can contain integers, floats, strings, structs, and array or BTreeMap of them. The struct and each field can be annotated using the "stat" attribute. The "Stat" derive macro generates the necessary metadata. - On-demand reporting. ScxStatServer implements a UNIX domain socket server which communicates using a simple json protocol. Stat structs can be added to the server with closures to output them. There can be any number of readers. ScxStatServer can be queried for the metadata of all stat structs that it may report so that a generic tool can be written to pipe the reported statistics to other frameworks (e.g. openmetrics). - The interface protocol is pretty simple and it isn't difficult to interact directly in json. ScxStatClient is provided to further simplify client implementation. - See rust/scx_stat/examples/{server,client}.rs for usage.	2024-08-14 11:37:06 -10:00
Tejun Heo	45f7fd13b7	versions: Synchronize crate dependency versions	2024-08-08 14:45:46 -10:00
Tejun Heo	63c4a0191f	Merge branch 'main' into topic/inlined-skeleton-members	2024-08-08 14:23:37 -10:00
Tejun Heo	cd6a4d72c7	Bump versions for 1.0.2 release	2024-08-08 14:10:16 -10:00
Tejun Heo	7c3ffe96e1	Unify crate dependency versions Different sub-projects are using different versions for the same crates. Synchronize them to the latest.	2024-08-08 13:26:47 -10:00
Andrea Righi	9d808ae206	Merge pull request #468 from sched-ext/rustland-refactoring scx_rustland refactoring	2024-08-07 11:38:21 +02:00
Andrea Righi	51cfb69199	scx_rustland_core: re-introduce partial mode Re-add the partial mode option that was dropped during the refactoring. The partial option allows to apply the scheduler only to the tasks which have their scheduling policy set to SCHED_EXT via sched_setscheduler(). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-07 08:41:06 +02:00
Andrea Righi	e1f2b3822e	scx_rustland_core: drop CPU ownership API The API for determining which PID is running on a specific CPU is racy and is unnecessary since this information can be obtained from user space. Additionally, it's not reliable for identifying idle CPUs. Therefore, it's better to remove this API and, in the future, provide a cpumask alternative that can export the idle state of the CPUs to user space. As a consequence also change scx_rustland to dispatch one task a time, instead of dispatching tasks in batches of idle cores (that are usually not accurate due to the racy nature of the CPU ownership interaface). Dispatching one task at a time even makes the scheduler more performant, due to the vruntime scheduling being applied to more tasks sitting in the scheduler's queue. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-07 08:41:06 +02:00
Andrea Righi	9a0e7755df	scx_rustland_core: export counter of online CPUs Introduce a helper to get the amount of online CPUs tracked by the BPF part. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-07 08:10:53 +02:00
Andrea Righi	d9c9f78e3e	scx_rustland: re-align vruntime and time slice evaluation to scx_bpfland Drop the slice boost logic and apply a vruntime and task time slice evaluation approach similar to scx_bpfland (but implement this in the user-space component instead of the BPF part). Additionally, introduce a slice_us_min parameter to define the minimum time slice that can be assigned to a task, also similar to scx_bpfland. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-07 08:10:53 +02:00
Andrea Righi	e1e6e31208	scx_rustland_core: update copyright info Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-07 08:10:53 +02:00
Andrea Righi	b87541a26e	scx_rustland_core: refactor idle CPU selection logic Use the same idle selection logic used in scx_bpfland also in scx_rustland_core. Also drop fifo_mode and always use the BPF idle selection logic by default as long as the system is not saturated, unless full_user is specified. This approach allows user-space schedulers aiming for maximum performance to leverage the BPF idle selection logic (bypassing user-space), while those seeking full control can enable full_user to bypass the BPF CPU idle selection logic and choose the target CPU for each task from user-space. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-07 08:10:53 +02:00
Daniel Hodges	55ac9d1527	Merge pull request #474 from hodgesds/bpf-builder-fixes scx_utils: Fix build args	2024-08-06 22:42:44 -04:00
Daniel Hodges	459ed82e6c	scx_utils: Fix build args Fix args to bpf_builder, existing warnings display for "-Wno-compare-distinct-pointer-types'". Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-08-06 19:10:34 -07:00
Andrea Righi	d8985306f4	scx_rustland: user-space interactive task classifier We don't need to send the number of voluntary context switches (nvcsw) from BPF to user-space, as this information is already accessible in user-space via procfs. Sending this data would only create unnecessary overhead for schedulers that don't require it, and those that do can easily retrieve it through procfs. Therefore, drop this metric from scx_rustland_core and change scx_rustland implementing an interactive task classifier fully in the user-space part of the scheduler. Also drop some options that are not provide any significant benefit (also in preparation of a bigger refactoring to define a better API for the user-space framework). Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-06 17:56:58 +02:00
Andrea Righi	a8d14fc0c4	Merge pull request #471 from sched-ext/rustland-core-musl scx_rustland_core: add support for musl	2024-08-06 17:56:19 +02:00
Kawanaao	c3109ebeed	scx_rustland_core alloc: Replaced RefCell with Mutex Necessary for some multi-threaded cases	2024-08-06 13:10:21 +00:00
Andrea Righi	4c7fb5cdbd	scx_rustland_core: add support for musl It seems that musl glibc is not POSIX.1-2001, POSIX.1-2008 compliant and using sched_setscheduler() just returns -ENOSYS: https://git.musl-libc.org/cgit/musl/commit/src/sched/sched_setscheduler.c?id=1e21e78bf7a5c24c217446d8760be7b7188711c2 Switch to pthread_setschedparam() to properly support building scx_rustland_core with musl. This fixes #469. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-06 07:32:38 +02:00
Andrea Righi	d4005dd186	scx_rustland_core: fix missing import (timespec) Explicitly import timespec to fix the following potential build error: error[E0422]: cannot find struct, variant or union type `timespec` in this scope --> src/bpf.rs:365:35 \| 365 \| sched_ss_repl_period: timespec { \| ^^^^^^^^ not found in this scope \| help: consider importing this struct \| 6 + use libc::timespec; \| This fixes issue #469. Signed-off-by: Andrea Righi <andrea.righi@linux.dev>	2024-08-05 23:15:48 +02:00
Daniel Hodges	6da3554547	scx_utils: Add CPUs method on topology Add a cpus method various subfields in the topology struct to easily get the map of CPUs for nodes/LLCs. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-07-23 10:46:38 -07:00
Daniel Müller	565aec3662	rust: Update libbpf-rs & libbpf-cargo to 0.24 Update libbpf-rs & libbpf-cargo to 0.24. Among other things, generated skeletons now contain directly accessible map and program objects, no longer necessitating the use of accessor methods. As a result, the risk for mutability conflicts is reduced greatly. Signed-off-by: Daniel Müller <deso@posteo.net>	2024-07-16 11:48:52 -07:00
Daniel Hodges	97fb0fa0c8	scx_utils: Add LLC id to CPU Add the LLC id to the Cpu struct, which will make it easier for referencing the LLC when doing operations on a Cpu. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-07-16 07:46:55 -07:00
Tejun Heo	51334b5c4d	Bump versions for 1.0.1 release	2024-07-15 13:21:52 -10:00
Jose Fernandez	a63bf77d68	scx_utils: Display the tag value for nested counters in log_recorder The log_recorder was incorrectly displaying the tag name instead of the value for nested counters. Signed-off-by: Jose Fernandez <josef@netflix.com>	2024-07-15 15:20:44 -06:00
Tejun Heo	761ec142ce	Bump most versions to 1.0.0 sched_ext is about to be merged upstream. There are some compatibility breaking changes and we're making the current sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on !wakeup enqueues") the baseline. Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early development and is currently temporarily disabled, only the patchlevel is bumped.	2024-07-12 11:34:14 -10:00
Tejun Heo	7f8e4edb53	Merge pull request #397 from jfernandez/log-recorder-customize sched_utils: Add log recorder format customization	2024-07-05 07:37:39 -10:00
I Hsin Cheng	1595da78dc	scx_rustland_core: Remove unused variable Remove unused variable "tctx" in rustland_select_cpu. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-07-05 01:04:49 +08:00
Jose Fernandez	f64cdd1a51	sched_utils: Add log recorder format customization This change adds the ability to customize the log recorder format for each metric type. There is a default format that is used if no custom `MetricFormatter` is provided. This is the same format that was used before this change. The `MetricFormatter` should be implemented by the user to customize the format of the log recorder. The `LogRecorderBuilder` now takes a `MetricFormatter` as an optional parameter. Following changes will allow additional customization of the log recorder format, such as how many metrics are logged per line. Signed-off-by: Jose Fernandez <josef@netflix.com>	2024-06-28 06:21:03 -06:00
Andrea Righi	a7965abdbc	scx_utils: clarify error about missing CONFIG_DEBUG_INFO_BTF If CONFIG_DEBUG_INFO_BTF is not enabled in the kernel, the C schedulers report the following error via libbpf, clearly indicating the missing kernel config: libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? In contrast, the Rust schedulers report a less clear error: thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9: btf__load_vmlinux_btf() returned NULL note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace Make sure to report a similar error, so that users have a better clue about the missing kernel config. After this change the error looks like the following: thread 'main' panicked at /home/arighi/src/scx/rust/scx_utils/src/compat.rs:23:9: btf__load_vmlinux_btf() returned NULL, was CONFIG_DEBUG_INFO_BTF enabled? Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-27 09:15:43 +02:00
David Vernet	2aa8bbc32d	utils: Export build ID values from rust scx_utils We want schedulers to be able to print, log, etc the build ID of the repository. To do this, we can use the vergen Cargo crate to generate environment variables that contain values that we can export from scx_utils. This patch update scx_utils accordingly to use vergen to generate build ID output that can be printed from schedulers. A subsequent patch will update scx_rusty to print this build ID value. Signed-off-by: David Vernet <void@manifault.com>	2024-06-25 11:44:19 -05:00
David Vernet	9d9ece11aa	Merge pull request #384 from jfernandez/log-recorder scx_utils: Add log_recorder module for metrics-rs	2024-06-25 11:43:37 -05:00
Andrea Righi	5db0908530	scx_rustland_core: make sure to use a valid CPU during direct dispatch We may end up selecting an invalid CPU (according to the task's cpumask) when dispatching the task via dispatch_direct_cpu(). When this happens simply return an error and do not dispatch the task and let the caller handle the error: in the context of select_cpu() we can simply ignore the dispatch and return the target CPU; in the context of FIFO mode dispatch we can fallback to SCX_DSQ_LOCAL if the target CPU is not valid. This fixes issue #353. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-25 14:11:46 +02:00
Andrea Righi	e4b13b2aa6	scx_rustland_core: reduce dispatch overhead Kick CPUs in the dispatch path only when needed (typically when tasks are bounced to other CPUs). Moreover, avoid to consume all the tasks dispatched at once. This seems to reduce the BPF overhead (according to bpftop), going from ~10% CPU usage down to ~6% CPU usage of rustland_dispatch() on an over commissioned system, without introducing any measureable performance regression. Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-25 14:02:39 +02:00
Andrea Righi	631d5576dc	scx_rustland_core: refactor CPU selection logic Allow to dispatch tasks directly (bypassing the user-space scheduler) only when the scheduler is operating in FIFO mode. On an over-commissioned system, directly dispatching tasks can only increase OS noise. These tasks can get a brief priority boost and an extended time slice just because they found an idle CPU, which can lead to erratic behavior. This is particularly problematic when measuring performance stability, such as evaluating the frames-per-second (fps) of a video game on an overloaded system. In such cases, it's better to bounce all tasks to the user-space scheduler, that will ensure a better level of fairness and smoother performance. Moreover, get rid of the second chance dispatch logic introduced in commit `4791d862` ("scx_rustland_core: second chance CPU migration"). This seems to provide benefits only on certain architectures (Intel), but it can introduces lags in others (AMD). Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-25 14:02:39 +02:00
Andrea Righi	19217b5722	scx_rustland_core: clarify comment about resuming FIFO mode Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-25 14:02:39 +02:00
Andrea Righi	081d4bdb86	scx_rustland_core: add debugging to dispatch_direct_cpu() Report dispatch_direct_cpu() events in the trace, like any other dispatch-related event. Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-25 14:02:39 +02:00
Jose Fernandez	e5984ed016	scx_utils: Add log_recorder module for metrics-rs This change adds a new module to the scx_utils crate that provides a log recorder for metrics-rs. The log recorder will log all metrics to the console at a configurable interval in an easy to read format. Each metric type will be displayed in a separate section. Indentation will be used to show the hierarchy of the metrics. This results in a more verbose output, but it is easier to read and understand. scx_rusty was updated to use the log recorder and all explicit metric logging was removed. Counters will show the total count and the rate of change per second. Counters with an additional label, like `type` in `dispatched_tasks_total` in rusty, will show the count, rate, and percentage of the total count. Counters: dispatched_tasks_total: 65559 [1344.8/s] prev_idle: 44963 (68.6%) [966.5/s] wsync_prev_idle: 15696 (23.9%) [317.3/s] direct_dispatch: 2833 (4.3%) [35.3/s] dsq: 1804 (2.8%) [21.3/s] wsync: 262 (0.4%) [4.3/s] direct_greedy: 1 (0.0%) [0.0/s] pinned: 0 (0.0%) [0.0/s] greedy_idle: 0 (0.0%) [0.0/s] greedy_xnuma: 0 (0.0%) [0.0/s] direct_greedy_far: 0 (0.0%) [0.0/s] greedy_local: 0 (0.0%) [0.0/s] dl_clamped_total: 1290 [20.3/s] dl_preset_total: 514 [1.0/s] kick_greedy_total: 6 [0.3/s] lb_data_errors_total: 0 [0.0/s] load_balance_total: 0 [0.0/s] repatriate_total: 0 [0.0/s] task_errors_total: 0 [0.0/s] Gauges will show the last set value: Gauges: slice_length_us: 20000.00 Histograms will show the average, min, and max. The histogram will be reset after each log interval to avoid memory leaks, since the data structure that holds the samples is unbounded. Histograms: cpu_busy_pct: avg=1.66 min=1.16 max=2.16 load_avg node=0: avg=0.31 min=0.23 max=0.39 load_avg node=0 dom=0: avg=0.31 min=0.23 max=0.39 processing_duration_us: avg=297.50 min=296.00 max=299.00 Signed-off-by: Jose Fernandez <josef@netflix.com>	2024-06-24 18:45:02 -06:00
David Vernet	bdbf4b9c05	topo: Return nr_cpu_ids from host Topology In some cases, a host may have an odd topology where there are gaps in CPU IDs (including between possible CPUs). A common pattern in schedulers is to perform allocations for every possible CPU ID, such as creating a per-cpu DSQ. In order to avoid confusing schedulers, let's track the maximum CPU ID on a system so that we can return the number of CPU IDs on the system which is inclusive of gaps. We also update scx_rustland in this change to accommodate the fact that we no longer export nr_cpus_possible() from TopologyMap. Signed-off-by: David Vernet <void@manifault.com>	2024-06-21 12:57:13 -05:00
Andrea Righi	b04e82b5eb	scx_rustland_core: include buddy-alloc and refactor allocator code The dependency of the buddy-alloc crate [1] seems to cause some troubles with packaging, mostly because the selftests for the crate are failing when it's compiled in release mode. For example: $ cargo test --release -- --nocapture thread 'tests::fast_alloc::test_basic_malloc' panicked at src/tests/fast_alloc.rs:25:13: assertion `left == right` failed left: 0 right: 42 Some of these failures with BuddyAlloc can be fixed by using a memory arena buffer aligned to page size. However, some test failures with FastAlloc persist that cannot be resolved merely by aligning the pre-allocated memory arena to the page size, as mentioned in [2]. The concern is that this may potentially lead to actual memory bugs. Therefore, it seems safer to refactor the custom allocator code to simply use BuddyAlloc, dropping FastAlloc completely. To achieve this, the entire BuddyAlloc code has been directly included in scx_rustland_core, referencing the original project and its MIT licensing information (with the entire code still distributed under the GPLv2 license). Then the code has been slightly modified to remove FastAlloc and the external dependency on the buddy-alloc crate has been dropped. From a performance perspective this change doesn't seem to introduce any measurable regression. [1] https://github.com/jjyr/buddy-alloc [2] https://github.com/jjyr/buddy-alloc/issues/16 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-19 14:44:04 +02:00
I Hsin Cheng	1334a4df5d	scx_utils: Utilize Entry API for BTreeMap insertion Take advantages of BTreeMap's Entry API working with or_insert() to do the conditional insertion. Insert only when the entry doesn't exist. Doing so can reduce the amount of code and provide better readability and perform in-place manipulation. Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-06-19 10:27:10 +08:00
I Hsin Cheng	f13697a755	scx_utils: Fix typos Correct "Thie" to "This". Signed-off-by: I Hsin Cheng <richard120310@gmail.com>	2024-06-18 00:35:43 +08:00
Tejun Heo	d18bb4831a	Merge pull request #363 from sched-ext/htejun/compat-strip Strip compat support	2024-06-16 20:04:50 -10:00
Tejun Heo	b6ebdc635a	compat: Compact min requirement checks Let's check only the latest one.	2024-06-16 06:53:58 -10:00
Tejun Heo	6319b25cf1	scx_utils: compat.rs: Follow-up clean-ups	2024-06-16 06:45:14 -10:00
Tejun Heo	aeb805a93e	compat: Drop support for missing sched_ext_ops.dump() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop support for missing sched_ext_ops.dump(). The open helper macros now check the existence of the fields and abort if missing.	2024-06-16 06:43:43 -10:00
Tejun Heo	4cca1e9acf	compat: Drop support for missing sched_ext_ops.tick() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop support for missing sched_ext_ops.tick(). The open helper macros now check the existence of the field and abort if missing.	2024-06-16 06:40:28 -10:00
Tejun Heo	970c04b43a	compat: Drop support for missing sched_ext_ops.exit_dump_len In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop support for missing sched_ext_ops.exit_dump_len. The open helper macros now check the existence of the field and abort if missing.	2024-06-16 06:37:34 -10:00
Tejun Heo	046bdfd5e0	compat: Drop support for missing sched_ext_ops.hotplug_seq In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop support for missing sched_ext_ops.hotplug_seq. The open helper macros now check the existence of the field and abort if missing.	2024-06-16 06:34:59 -10:00
Tejun Heo	dde2942125	compat: Drop __COMPAT_scx_bpf_cpuperf_() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_(). The open helper macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.	2024-06-16 06:16:53 -10:00
Tejun Heo	13e8388e1e	compat: Drop __COMPAT_HAS_CPUMASKS In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros now check the existence of scx_bpf_nr_cpu_ids() and abort if not.	2024-06-16 06:12:06 -10:00
Tejun Heo	66901e2b44	compat: Drop __COMPAT_scx_bpf_dump() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_scx_bpf_dump(). The open helper macros now check the existence of scx_bpf_dump_bstr() and abort if not. While at it, reorder the min requirement checks so that newly added ones are up top to make testing easier.	2024-06-16 06:02:47 -10:00
Tejun Heo	0d8adf2260	compat: Drop __COMPAT_scx_bpf_exit() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_scx_bpf_exit(). The open helper macros now check the existence of scx_bpf_exit_bstr() and abort if not.	2024-06-15 20:36:17 -10:00
Andrea Righi	812aeb3b81	scx_rustland_core: fix potential race in dispatch_task() Fix a potential race condition that might lead to a task being dispatched without kicking the target CPU, which could result in a potential stall. With this applied, scx_rustland has been running without any stall for about 18 hours on a system where the issue was previously quite easy to reproduce. Moreover, clarify a couple of comments in the dispatch path. This fixes issue #353. Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-16 08:31:48 +02:00
Tejun Heo	5b5e5be906	compat: Drop __COMPAT_SCX_KICK_IDLE In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros now check the existence of SCX_KICK_IDLE and abort if not.	2024-06-15 20:24:15 -10:00
Tejun Heo	7c9aedaefe	compat: Drop __COMPAT_scx_bpf_switch_all() In preparation of upstreaming, let's set the min version requirement at the released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.	2024-06-15 20:03:37 -10:00
Tejun Heo	fb2c70de84	scx_utils: compat.rs: Helper macros shouldn't return the calling function scx_ops_open!() and scx_ops_attach!() could return the calling function after an error, which can be surprising. Forutnately, as all the current callers are either unwrapping or returning on error, the surprising behavior is currently not very noticeable. Fix it by breaking out of the macro block on errors.	2024-06-15 18:27:39 -10:00
Andrea Righi	6cc2595922	scx_rustland_core: allow user-space scheduler to preempt other tasks The same change was previously applied with commit `8820af8d` ("scx_rustland: enable user-space scheduler to preempt other tasks"). However, it was reverted in commit `732ba490` ("scx_rustland: avoid using SCX_ENQ_PREEMPT") with the introduction of the global dynamic time slice (inversely proportional to the number of running tasks), because it already provided sufficient context switch opportunities, negating any advantage of the preemption. With the introduction of the virtual time slice in commit `6f4cd853` ("scx_rustland: introduce virtual time slice"), re-introducing the ability for the user-space scheduler to preempt other tasks now appears beneficial again. As confirmed by experimental results, this change helps prevent potential audio cracking issues and enhances overall system responsiveness, resulting in a 4-5% increase in fps performing the usual benchmark of gaming while recompiling the kernel. Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-14 20:09:19 +02:00
Andrea Righi	2d2c6083d7	scx_rustland_core: use SCX_DSQ_LOCAL only with kthreads Tasks dispatched directly using SCX_DSQ_LOCAL may receive excessively high priority compared to those dispatched by the user-space scheduler. To avoid this priority disparity, dispatch tasks to the per-CPU DSQs using direct dispatch and reserve SCX_DSQ_LOCAL for per-CPU kthreads only. Tested-by: Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-14 20:09:19 +02:00
Daniel Hodges	2ca42428cd	scx_utils: Add CPU freq transition latency This change adds the CPU frequency transition latency from the `cpuinfo_transition_latency` from sysfs. The value of this field is described [cpufreq docs](https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt). On supported systems it returns the CPU frequency transition latency in nanoseconds. The goal of this change is so that in the future schedulers can use this data to make better frequency scaling decisions. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>	2024-06-11 07:35:34 -07:00
Tejun Heo	3e3720fc7f	scx_utils: Add compat support for ops.tick() and ops.dump*() Match rust scx_ops_load!()'s compat support with C's SCX_OPS_LOAD().	2024-06-06 14:16:36 -10:00
Tejun Heo	200af60f2a	scx_layered: Fix load failure due to scheduler_tick() -> sched_tick() rename - scx_utils: Replace kfunc_exists() with ksym_exists() which doesn't care about the type of the symbol. - scx_layered: Fix load failure on kernels >= v6.10-rc due to scheduler_tick() -> sched_tick rename. Attach the tick fentry function to either scheduler_tick() or sched_tick().	2024-06-06 12:54:59 -10:00
Andrea Righi	40e67897a9	Merge pull request #331 from sched-ext/rustland-core-dispatch-debug scx_rustland_core: add extra debugging info to dispatch_task()	2024-06-04 18:58:01 +02:00
Andrea Righi	b363a13310	scx_rustland_core: add enqueue flags to debug info Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-04 17:14:38 +02:00
Andrea Righi	89384754ce	scx_rustland_core: add task time slice to the debug info Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-06-04 17:14:38 +02:00
Tejun Heo	e556dd375d	scx: Unify loading and running boilerplate across rust schedulers Make restart handling with user_exit_info simpler and consistently use the load and report macros consistently across the rust schedulers. This makes all schedulers automatically handle auto restarts from CPU hotplug events. Note that this is necessary even for scx_lavd which has CPU hotplug operations as CPU hotplug operations which took place between skel open and scheduler init can still trigger restart.	2024-06-03 12:25:41 -10:00
Tejun Heo	a2d5310cb6	Bump versions for a release	2024-06-03 08:35:21 -10:00
Andrea Righi	4503c7080a	scx_rustland_core: fix build error with musl As reported in #319, we may get a build failure in presence of musl, that requires additional parameters in sched_param. Fix by adding a proper conditional to support both gnu libc and musl libc. This fixes #319. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-26 22:30:04 +02:00
Yiwei Lin	8c2236770d	Reduce MAX_ENQUEUED_TASKS to fit percpu allocator The setting of ops->dispatch_max_batch leads to a too large allocated size for percpu allocator, and it will be unhappy if we want a size larger than PCPU_MIN_UNIT_SIZE. Reduce MAX_ENQUEUED_TASKS for fix.	2024-05-23 22:33:02 +08:00
Andrea Righi	4791d862f5	scx_rustland_core: second chance CPU migration Implement a second-chance migration in select_cpu(): after a task has been dispatched directly do not try to migrate it immediately on a different CPU, but force it to stay on prev_cpu for another round. This seems quite effective on certain architectures (such as on a system with 11th Gen Intel(R) Core(TM) i7-1195G7 @ 2.90GHz), and it can provide noticeable benefits with gaming or WebGL applications (such as https://webglsamples.org/aquarium/aquarium.html) under regular workload conditions (around +5% fps). Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 21:38:34 +02:00
Andrea Righi	6c129e9b4b	scx_rustland_core: consume from the shared DSQ before local DSQ The shared DSQ is typically used to prioritize tasks and dispatch them on the first CPU available, so consume from the shared DSQ before the local CPU DSQ. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 12:12:55 +02:00
Andrea Righi	9e4bea4a1c	scx_rustland_core: switch to FIFO when system is underutilized Provide a knob in scx_rustland_core to automatically turn the scheduler into a simple FIFO when the system is underutilized. This choice is based on the assumption that, in the case of system underutilization (less tasks running than the amount of available CPUs), the best scheduling policy is FIFO. With this option enabled the scheduler starts in FIFO mode. If most of the CPUs are busy (nr_running >= num_cpus - 1), the scheduler immediately exits from FIFO mode and starts to apply the logic implemented by the user-space component. Then the scheduler can switch back to FIFO if there are no tasks waiting to be scheduled (evaluated using a moving average). This option can be enabled/disabled by the user-space scheduler using the fifo_sched parameter in BpfScheduler: if set, the BPF component will periodically check for system utilization and switch back and forth to FIFO mode based on that. This allows to improve performance of workloads that are using a small amount of the available CPUs in the system, while still maintaining the same good level of performance for interactive tasks when the system is over commissioned. In certain video games, such as Baldur's Gate 3 or Counter-Strike 2, running in "normal" system conditions, we can experience a boost in fps of approximately 4-8% with this change applied. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 09:02:02 +02:00
Andrea Righi	912bde6a52	scx_rustland_core: simplify CPU selection logic Simplify the CPU idle selection logic relying on the built-in logic. If something can be improved in this logic it should be done in the backend, changing the default idle selection logic, rustland doesn't need to do anything special here for now. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 09:02:02 +02:00
Andrea Righi	0d75c80587	Revert "Merge pull request #305 from sched-ext/rustland-fifo-mode" This merge included additional commits that were supposed to be included in a separate pull request and have nothing to do with the fifo-mode changes. Therefore, revert the whole pull request and create a separate one with the correct list of commits required to implement this feature. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-22 09:00:25 +02:00
Andrea Righi	f27e67dfb8	scx_rustland_core: second chance CPU migration Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 18:08:58 +02:00
Andrea Righi	778ee1406f	scx_rustland_core: consume from the shared DSQ before local DSQ The shared DSQ is typically used to prioritize tasks and dispatch them on the first CPU available, so consume from the shared DSQ before the local CPU DSQ. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 18:08:19 +02:00
Andrea Righi	d25675ff44	scx_rustland_core: switch to FIFO when system is underutilized Provide a knob in scx_rustland_core to automatically turn the scheduler into a simple FIFO when the system is underutilized. This choice is based on the assumption that, in the case of system underutilization (less tasks running than the amount of available CPUs), the best scheduling policy is FIFO. With this option enabled the scheduler starts in FIFO mode. If most of the CPUs are busy (nr_running >= num_cpus - 1), the scheduler immediately exits from FIFO mode and starts to apply the logic implemented by the user-space component. Then the scheduler can switch back to FIFO if there are no tasks waiting to be scheduled (evaluated using a moving average). This option can be enabled/disabled by the user-space scheduler using the fifo_sched parameter in BpfScheduler: if set, the BPF component will periodically check for system utilization and switch back and forth to FIFO mode based on that. This allows to improve performance of workloads that are using a small amount of the available CPUs in the system, while still maintaining the same good level of performance for interactive tasks when the system is over commissioned. In certain video games, such as Baldur's Gate 3 or Counter-Strike 2, running in "normal" system conditions, we can experience a boost in fps of approximately 4-8% with this change applied. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 17:39:11 +02:00
Andrea Righi	3ad634c293	scx_rustland_core: simplify CPU selection logic Simplify the CPU idle selection logic relying on the built-in logic. If something can be improved in this logic it should be done in the backend, changing the default idle selection logic, rustland doesn't need to do anything special here for now. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-21 17:08:06 +02:00
David Vernet	ee940bd8b5	rustland: Mark get_cpu_owner() as __maybe_unused scx_rustland has a function called get_cpu_owner() in BPF which currently has no callers. There's nothing wrong with the function, but it causes a warning due to an unused function. Let's just annotate it with __maybe_unused to tell the compiler that it's not a problem. Signed-off-by: David Vernet <void@manifault.com>	2024-05-18 07:51:20 -05:00
Tejun Heo	ab25992416	Add missing skel.attach() calls C SCX_OPS_ATTACH() and rust scx_ops_attach() macros were not calling .attach() and were only attaching the struct_ops. This meant that all non-struct_ops BPF programs contained in the skels were never attached which breaks e.g. scx_layered. Let's fix it by adding .attach() invocation the the attach macros.	2024-05-17 14:33:04 -10:00
David Vernet	c1f1411c7a	Merge pull request #289 from sched-ext/rusty_hot_plug Add remaining hotplug pieces	2024-05-16 13:42:11 -06:00
David Vernet	4edd8d75e6	compat: Add scx_ops_open!() macro In order to make it easy for schedulers to use the hotplug_seq feature that's available in recent kernels, we'll need to provide a macro wrapper so that we can support the feature with backwards compatibility. This adds scx_ops_open!() to abstract that. Any scheduler that uses scx_ops_open()! will be exited if a hotplug event happens between opening the skeleton, and loading it. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 16:42:57 -05:00
David Vernet	34818de54d	rusty: Use built-in exit code for restarting Now that the kernel exports the SCX_ECODE_ACT_RESTART exit code, we can remove the custom hotplug logic from scx_rusty, and instead rely on the built-in logic from the kernel. There's still a corner case that we're not honoring: when a hotplug event happens on the init path. A future change will address this as well. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 16:31:56 -05:00
Andrea Righi	e9ac6105c7	scx_rustland_core: introduce low-power mode Introduce a low-power mode to force the scheduler to operate in a very non-work conserving way, causing a significant saving in terms of power consumption, while still providing a good level of responsiveness in the system. This option can be enabled in scx_rustland via the --low_power / -l option. The idea is to not immediately re-kick a CPU when it enters an idle state, but do that only if there are no other tasks running in the system. In this way, latency-critical tasks can be still dispatched immediately on the other active CPUs, while CPU-bound tasks will be forced to spend more time waiting to be scheduled, basically enforcing a special CPU throttling mechanism that affects only the tasks that are not latency critical. The consequence is a reduction in the overall system throughput, but also a significant reduction of power consumption, that can be useful for mobile / battery-powered devices. Test case (using `scx_rustland -l`): - play a video game (Terraria) while recompiling the kernel - measure game performance (fps) and core power consumption (W) - compare the result of normal mode vs low-power mode Result: Game performance \| Power consumption \| ------------+-----------------+-------------------+ normal mode \| 60 fps \| 6W \| low-power mode \| 60 fps \| 3W \| As we can see from the result the reduction of power consumption is quite significant (50%), while the responsiveness of the game (fps) remains the same, that means battery life can be potentially doubled without significantly affecting system responsiveness. The overall throughput of the system is, of course, affected in a negative way (kernel build is approximately 50% slower during this test), but the goal here is to save power while still maintaining a good level of responsiveness in the system. For this reason the low-power mode should be considered only in emergency conditions, for example when the system is close to completely run out of power or simply to extend the battery life of a mobile device without compromising its responsiveness. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-15 20:32:05 +02:00
David Vernet	c6cc59e8b4	uei: Expose exit code enums from user_exit_info.rs Schedulers and the kernel can include an exit code when exiting a scheduler. There are some built-in codes that can be specified: SCX_ECODE_RSN_HOTPLUG, and SCX_ECODE_ACT_RESTART. Some schedulers may want to check the exit code against these values, so let's export them from user_exit_info.rs. We use lazy_static so that we can read the values for the enum for the currently-running kernel. Signed-off-by: David Vernet <void@manifault.com>	2024-05-15 13:31:07 -05:00
Andrea Righi	45a9b178cd	scx_rustland_core: expose nr_running metric Expose the new metric nr_running to keep track of the amount of currently running tasks. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-15 20:28:10 +02:00
Andrea Righi	5f9ce3bba6	Merge pull request #272 from sched-ext/rustland-reduce-scheduling-overhead rustland: reduce scheduling overhead	2024-05-11 10:17:01 +02:00
Andrea Righi	209c454149	scx_rustland_core: fix update_idle description The comment that describes rustland_update_idle() is still incorrectly reporting an old implemention detail. Update its description for better clarity. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-11 07:37:33 +02:00
Andrea Righi	311b7f861c	scx_rustland_core: refine built-in CPU idle selection logic Change the BPF CPU selection logic as following: - if the previously used CPU is idle, keep using it - if the task is not coming from a wait state, try to stick as much as possible to the same CPU (for better cache usage) - if the task is waking up from a wait state rely on the sched_ext built-int idle selection logic This logic can be completely disabled when the full user-space mode is enabled. In this case tasks will always be assigned to the previously used CPU and the user-space scheduler should take care of distributing them among the available CPUs. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-11 07:37:31 +02:00
David Vernet	904a89117c	topology: Support CONFIG_NUMA=n in Topology crate Some users are running with NUMA disabled, which makes sense given that it's useless in a lot of contexts. Let's make the Topology crate assume a default node with ID 0 in such cases. Signed-off-by: David Vernet <void@manifault.com>	2024-05-10 16:46:15 -05:00
Andrea Righi	63feba9c2b	topology: TopologyMap: add nr_cpus_online() Add a method to TopologyMap to get the amount of online CPUs. Considering that most of the schedulers are not handling CPU hotplugging it can be useful to expose also this metric in addition to the amount of available CPUs in the system. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 17:24:20 +02:00
Andrea Righi	f052493005	scx_rustland_core: implement effective time slice on a per-task basis Drop the global effective time-slice and use the more fine-grained per-task time-slice to implement the dynamic time-slice capability. This allows to reduce the scheduler's overhead (dropping the global time slice volatile variable shared between user-space and BPF) and it provides a more fine-grained control on the per-task time slice. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 17:24:20 +02:00
Andrea Righi	887812197c	scx_utils: report an explicit error when another scheduler is running If another scheduler is already running, the Rust schedulers based on scx_utils are reporting an error like the following, that can be a bit difficult to understand: Error: Failed to attach struct ops Caused by: bpf call "libbpf_rs::map::Map::attach_struct_ops::{{closure}}" returned NULL Change the scx_ops_attach macro to check if another sched_ext scheduler is running and in that case report a more explicit error. With this applied: $ sudo scx_rustland Error: another sched_ext scheduler is already running Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-10 11:20:19 +02:00
Andrea Righi	5da4602ad7	scx_rustland_core: use a BPF_MAP_TYPE_USER_RINGBUF to dispatch tasks Replace the BPF_MAP_TYPE_QUEUE with a BPF_MAP_TYPE_USER_RINGBUF to store the tasks dispatched from the user-space scheduler to the BPF component. This eliminates the need of the bpf() syscalls, significantly reducing the overhead of the user-space->kernel communication and delivering a notable performance boost in the overall system throughput. Based on experimental results, this change allows to reduces the scheduling overhead by approximately 30-35% when the system is overcommitted. This improvement has the potential to make user-space schedulers based on scx_rustland_core viable options for real production systems. Link: https://github.com/libbpf/libbpf-rs/pull/776 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-05-08 22:16:53 +02:00
Andrea Righi	fd68ce13a7	scx_rustland_core: bump up version to 0.4.0 Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-30 18:09:09 +02:00
Andrea Righi	be415d4c06	scx_rustland_core: relax compact unevictable memory constraint In order to prevent deadlock conditions user-space schedulers need to perform memory allocations from a pool of pre-allocated memory that will never be unmapped/reclaimed by the kernel (unevictable memory). However, there is a special kernel sysctl setting (vm.compact_unevictable_allowed) that allows kcompactd to reclaim unevictable memory. This behavior should be prevented by setting vm.compact_unevictable_allowed = 0, that is what scx_rustland_core does transparently when a scheduler is started, restoring the previous value when the scheduler is stopped. Unfortunately, this is not always doable, especially when running a scheduler inside a containerized environment or under certain security/privilege restrictions (e.g., AppArmor confinement). Therefore, just report a WARNING if we are unable to change this parameter, instead of considering it a hard-failure, to allow running scx_rustland_core schedulers also inside such limited environments. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-30 18:09:09 +02:00
Tejun Heo	cf66e58118	Sync from kernel (670bdab6073) And fix build breakage in scx_utils due to an enum type rename.	2024-04-29 09:58:19 -10:00
Andrea Righi	cabde30736	scx_utils: bump up version to 0.8.0 Bump up scx-utils version to provide the new scx_utils::TopologyMap. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-28 21:01:16 +02:00
Andrea Righi	fb2f5c240e	scx_rustland_core: bump up version to 0.3 Given that rustland_core now supports task preemption and it has been tested successfully, it's worhtwhile to cut a new version of the crate. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-28 12:01:38 +02:00
Andrea Righi	cdf3729355	scx_rustland_core: make crate version accessible Export the version of scx_rustland_core to the users of the crate. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-28 12:01:38 +02:00
Andrea Righi	973aded5a8	Merge pull request #238 from sched-ext/rustland-reduce-topology-overhead scx_rustland: reduce overhead by caching host topology	2024-04-24 22:24:23 +02:00

1 2 3 4 5

234 Commits