scx-upstream

mirror of https://github.com/sched-ext/scx.git synced 2024-10-30 03:12:19 +00:00

Author	SHA1	Message	Date
Andrea Righi	6d2aac1591	scx_rustland_core: introduce dispatch flags Reserve some bits of the `cpu` attribute of a task to store special dispatch flags. Initially, let's introduce just RL_CPU_ANY to replace the special value NO_CPU, indicating that the task can be dispatched on any CPU, specifically the first CPU that becomes available. This allows to keep the CPU value assigned by the builtin idle selection logic, that can potentially be used later for further optimizations. Moreover, having the possibility to specify dispatch flags gives more flexibility and it allows to map new scheduling features to such flags. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-23 07:13:30 +02:00
takase1121	3e12676ca2	scheds-rust: add explanation for chaining schedulers	2024-04-23 08:30:38 +08:00
takase1121	5d20f89a87	scheds-rust: build rust schedulers in sequence	2024-04-23 08:06:27 +08:00
David Vernet	5f1eac85ff	layered: Fix init_task When I transitioned layered to using task local storage, I messed up initializing the task ctx, not realizing we previously had a separate variable that was initializing the hasmap entry. We need to initialize the task's layer to -11, and also set refresh_layer to 1. Signed-off-by: David Vernet <void@manifault.com>	2024-04-18 09:44:32 -05:00
David Vernet	45589cd0f7	lavd: Fix a few typos Noticed a few typos. Let's fix em up Signed-off-by: David Vernet <void@manifault.com>	2024-04-17 08:17:52 -05:00
David Vernet	eed338ef25	simple: Invoke __COMPAT_scx_bpf_switch_all(); scx_simple no longer supports running in "partial" mode, with only certain tasks usig scx_simple. When this option was removed, we also removed the call to scx_bpf_switch_all(); While switching-all is the default behavior for newer kernels, let's add __COMPAT_scx_bpf_switch_all() so that scx_simple can work on older kernels as well. Signed-off-by: David Vernet <void@manifault.com>	2024-04-16 11:09:44 -05:00
David Vernet	ffced1f615	rusty: Remove explicit padding As of libbpf-rs 0.23.0 (which contains commit `9d9e979fcf`), libbpf-rs now generates rust structs that honor padding. We can therefore remove the custom padding in scx_rusty's struct pcpu_ctx. For example, here is the generated pub struct pcpu_ctx: pub struct pcpu_ctx { pub dom_rr_cur: u32, pub dom_id: u32, pub nr_node_doms: u32, pub node_doms: [u32; 64], pub __pad_268: [u8; 52], } And here is the matching struct in the BPF object file: struct pcpu_ctx { u32 dom_rr_cur; /* 0 4 / u32 dom_id; / 4 4 / u32 nr_node_doms; / 8 4 / u32 node_doms[64]; / 12 256 / / size: 320, cachelines: 5, members: 4 / / padding: 52 */ } __attribute__((__aligned__(64))); Signed-off-by: David Vernet <void@manifault.com>	2024-04-12 13:52:13 -05:00
David Vernet	e032ee7cc0	rusty: Add lookup_pcpu_ctx() helper Getting rid of more boilerplate Signed-off-by: David Vernet <void@manifault.com>	2024-04-11 19:30:23 -05:00
David Vernet	885a9fd7da	rusty: Make lookup_task_ctx() static It doesn't need to be a global prog. Let's make it static. Signed-off-by: David Vernet <void@manifault.com>	2024-04-11 19:30:23 -05:00
David Vernet	0ff73754cf	rusty: Add create_save_cpumask() helper We have a lot of boilerplate code where we create a cpumask, initialize it, and then bpf_kptr_xchg() it into the map. In an effort to slightly reduce the amount of boilerplate, let's create a helper that can alleviate some of it. Signed-off-by: David Vernet <void@manifault.com>	2024-04-11 19:30:21 -05:00
David Vernet	e27d5b4e67	rusty: Fix a few random issues There are some random issues in the code, like unused variables, and bad print formatters. I'm not sure why the compiler isn't consistently complaining, but let's fix them. Signed-off-by: David Vernet <void@manifault.com>	2024-04-11 19:21:02 -05:00
David Vernet	31cc2dccb9	rusty: Allocate DSQ on appropriate NUMA node In scx_rusty, now that we have a complete view of the host's topology thanks to the Topology crate, we can update our calls to scx_bpf_create_dsq() to create the DSQ on the NUMA node of the domain. It's unclear how much this will end up mattering for performance in the typical case, but we might as well do the right thing given that host topolgoy is static, and we have the information. Signed-off-by: David Vernet <void@manifault.com>	2024-04-11 00:01:25 -05:00
Dan Schatzberg	6eefc8c27f	Fix error typo ENONET means "Machine is not on the network" - this was supposed to be ENOENT "No such file or directory"	2024-04-10 15:28:05 -04:00
Changwoo Min	f53c29759e	scx_lavd: support preemption (in some scenarios) (#224 ) * scx-lavd: preemption of a lower-priority task using kick cpu When a task is enqueued to the global queue, the scheduler checks if there is a lower priority task than the enqueued task. If so, it kicks out the lower-priority task, hoping the newly enqueued task or another higher-priority task runs on the kicked CPU. Kicking another CPU is expensive as an IPI is involved, so the scheduler judiciously kicks the CPU when its benefit (i.e., priority gap) is clear enough. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-04-09 14:25:53 +09:00
David Vernet	9a8ed8ab44	Merge pull request #218 from sched-ext/rusty_hotplug Gracefully handle hotplug in scx_rusty	2024-04-04 16:03:59 -05:00
Andrea Righi	17a30bddc9	scx_rustland_core: bump up version to 0.2 Bump up the version of the crate and update dependencies. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 22:44:55 +02:00
David Vernet	622b61dd2f	rusty: Support restarting rusty on hotplug events The scx_rusty scheduler does not support hotplug, and expects a static host topology throughout its runtime. Though the kernel does have support for detecting hotplug events, we currently don't detect this in the kernel, nor surface it to user space when it happens. Now that we have scx_bpf_exit(), we can gracefully exit the kernel in the event of a hotplug, and communicate to user space that it should restart the scheduler. This patch adds that support to scx_rusty. Note that this assumes that we're running on a recent enough kernel that has scx_bpf_exit(). If it doesn't, then we instead just error out of the kernel scheduler and exit the application. Signed-off-by: David Vernet <void@manifault.com>	2024-04-04 14:52:48 -05:00
Tejun Heo	ba52cc131b	scx_lavd: Add .gitignore	2024-04-04 07:15:37 -10:00
Andrea Righi	eca7ecd24e	build: introduce kernel_headers build option If we try to cross-build scx on builders with older versions of system's linux headers (such as those provided by linux-libc-headers in older releases of Ubuntu), we may hit build failures, due to the different kernel ABI, such as: error: invalid use of undefined type ‘struct btf_enum64’ To address this, introduce a new build option called "kernel_headers" that allows to specify a custom path for the kernel headers required during the build process. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>	2024-04-04 10:53:36 +02:00
Tejun Heo	a60737a6bf	Merge pull request #207 from sched-ext/api-updates scx: Apply API updates from sched_ext	2024-04-02 14:26:42 -10:00
Tejun Heo	348fe53256	Sync from kernel Synchronize stragglers. - Bug fix in __COMPAT_read_enum(). - A cosmetic difference in scx_qmap.bpf.c. - Stray 'p' when calling getopt() in scx_simple.c. After this the kernel tree and scx repo are in sync.	2024-04-02 11:29:50 -10:00
Tejun Heo	b925bdf94d	Cargo.toml: Update libbpf-rs/cargo dependencies to 0.23 and drop patch.crates-io sections New versions of libbpf-rs and libbpf-cargo are now available with all the needed features. Update the dependencies and drop the patch sections.	2024-04-02 11:19:39 -10:00
Tejun Heo	6f81409df4	Bump versions - scx_utils bumped from 0.6.0 to 0.7.0. - Repo and rust schedulers get a PATCH level bump.	2024-04-02 10:58:50 -10:00
Tejun Heo	f3e20ae9b3	scx_rustland: Apply API updates and add --exit-dump-len option to scx_rustland	2024-04-02 10:30:56 -10:00
David Vernet	5088328f9e	rusty: Check LOCAL_DSQ length for WAKE_SYNC In rusty_select_cpu(), if a task is WAKE_SYNC, we'll currently migrate the task to that CPU if there are any idle cores on the system. As in [0], this condition is insufficient, as there could be idle cores elsewhere on the system, but still tasks piled up on a single local DSQ. Let's add a condition that the local DSQ has to be empty in order to apply the WAKE_SYNC migration. Before patch: [void@maniforge src]$ hackbench Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 100 messages of 100 bytes Time: 0.433 With patch: [void@maniforge src]$ hackbench Running in process mode with 10 groups using 40 file descriptors each (== 400 tasks) Each sender will pass 100 messages of 100 bytes Time: 0.035 Signed-off-by: David Vernet <void@manifault.com>	2024-04-02 15:17:32 -05:00
Tejun Heo	06fdae177f	vmlinux: Update to 5dc95302301fb7e51cd4d218008b9dad10110069	2024-04-02 10:08:18 -10:00
Tejun Heo	98e586ce63	vmlinux: Drop unused old vmlinux headers No need to keep them around.	2024-04-02 10:08:09 -10:00
Tejun Heo	dfa978d166	scx_lavd: Apply API updates	2024-04-02 10:08:02 -10:00
Tejun Heo	0c07f382b1	scx_rusty: Apply API updates	2024-04-02 10:07:54 -10:00
Tejun Heo	59bbd800c1	compat: Implement scx_utils::compat and fix up scx_layered Implement scx_utils::compat to match C's scx/compat.h and update scx_layered. Other rust scheds are still broken.	2024-04-02 07:08:56 -10:00
Changwoo Min	3a3bd2a750	scx_lavd: increase the upper bound of ineligible duration Change the upper bound of ineligible duration (LAVD_ELIGIBLE_TIME_MAX). The updated (2x increased) upper bound reflects the distribution of tasks' eligible_delta_ns better. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-30 22:59:06 +09:00
Changwoo Min	8efaf0c4c2	scx_lavd: improve the accuracy of task's run_freq Change the calculation of the run_frequence using the wait_period from the last time the task yielded CPU to this time when the task is running. The old implementation measures the time interval between the last stopping and the current running and increases run_freq without reason. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-30 22:55:17 +09:00
Changwoo Min	fe3efb8ce2	scx_lavd: rename last_{start/stop/wait/wake}_clk for consistency Change the last_{start/stop/wait/wake}_clk in task_ctx to last_{running/stopping/quiescent/runnable}_clk, matching with state transition names. In addition, add comments and reorder fields in task_ctx for readability. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-30 10:13:20 +09:00
Tejun Heo	891df57b98	scheds/c: Fix up C schedulers Fix up the remaining C schedulers after the recent API and header updates. Also drop stray -p from usage help message from some schedulers.	2024-03-29 12:15:45 -10:00
Tejun Heo	9447cb27b2	scx: Sync from kernel, some schedulers are broken Sync from kernel to receive new vmlinux.h and the updates to common headers. This includes the following updates: - scx_bpf_switch_all() is replaced by SCX_OPS_SWITCH_PARTIAL flag. - sched_ext_ops.exit_dump_len added to allow customizing dump buffer size. - scx_bpf_exit() added. - Common headers updated to provide backward compatibility in a way which hides most complexities from scheduler implementations. scx_simple, qmap, central and flatcg are updated accordingly. Other schedulers are broken for the moment.	2024-03-29 12:14:26 -10:00
Changwoo Min	3ba10a8d4f	scx_lavd: accumulate consecutive runnings When a task runs more than once (running <->stopping) within one runnable-quiescent transition, accumulate runtime of multiple runnings for statistics. This helps to get the task's runtime per schedule when supposing that a huge time slice is given, which is what we want to collect for scheduling decisions. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-29 17:19:30 +09:00
Changwoo Min	7b99ed9c5c	scx_lavd: drop runtime_boost using slice_boost_prio Remove runtime_boost using slice_boost_prio. Without slice_boost_prio, the scheduler collects the exact time slice. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-29 16:31:03 +09:00
Changwoo Min	5629189527	scx_lavd: change update_stat_for_() for consistency Let's change the function names of update_stat_for_() as follow their callers for consistency and less confusion. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-29 14:49:06 +09:00
Changwoo Min	04c9e7fe9d	Merge pull request #201 from multics69/perf-vdeadline01 scx_lavd: fix merge conflicts between PR 197 and 199	2024-03-28 14:15:00 +09:00
Changwoo Min	0ea1aab070	scx_lavd: fix merge conflicts Merge branch 'perf-vdeadline01' of github.com:sched-ext/scx into perf-vdeadline01	2024-03-28 13:49:19 +09:00
Tejun Heo	340938025f	Merge pull request #200 from sched-ext/layered_delete layered: Use TLS map instead of hash map	2024-03-27 17:09:20 -10:00
Changwoo Min	60472db845	Merge pull request #197 from multics69/perf-vdeadline01 scx_lavd: improve virtual deadline calculation	2024-03-28 11:44:54 +09:00
Changwoo Min	67f41c7d83	scx_lavd: bug fix: slice_boost should be update before adjusted runtime The run_time_boosted_ns calculation requires updated slice_boost_prio, so updating slice_boost_prio should be done before updating run_time_boosted_ns. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-28 11:21:42 +09:00
David Vernet	e857dd90ab	layered: Use TLS map instead of hash map In scx_layered, we're using a BPF_MAP_TYPE_HASH map (indexed by pid) rather than a BPF_MAP_TYPE_TASK_STORAGE, to track local storage for a task. As far as I can tell, there's no reason we need to be doing this. We never access the map from user space, and we're even passing a struct task_struct * to a helper subprog to look up the task context rather than only doing it by pid. Using a hashmap is error prone for this because we end up having to manually track lifecycles for entries in the map rather than relying on BPF to do it for us. For example, BPF will automatically free a task's entry from the map when it exits. Let's just use TLS here rather than a hashmap to avoid issues from this (e.g. we've observed the scheduler getting evicted because we're accessing a stale map entry after a task has been destroyed). Reported-by: Valentin Andrei <vandrei@meta.com> Signed-off-by: David Vernet <void@manifault.com>	2024-03-27 20:14:27 -05:00
Changwoo Min	31157ebc81	scx-lavd: make the comments in update_sys_cpu_load() clear The current description is a bit confusing, so update the comments for clarity. Signed-off-by: Changwoo Min <changwoo@igalia.com>	2024-03-28 06:45:57 +09:00
Tejun Heo	129d99f542	scx_lavd: Remove custom task state tracking transit_task_stat() is now tracking the same runnable, running, stopping, quiescent transitions that sched_ext core already tracks and always returns %true. Let's remove it.	2024-03-26 12:23:19 -10:00
Tejun Heo	d7ec05e017	scx_lavd: Call update_stat_for_enq() from lavd_runnable() LAVD_TASK_STAT_ENQ is tracking a subset of runnable task state transitions - the ones which end up calling ops.enqueue(). However, what it is trying to track is a task becoming runnable so that its load can be added to the cpu's load sum. Move the LAVD_TASK_STAT_ENQ state transition and update_stat_for_enq() invocation to ops.runnable() which is called for all runnable transitions. Note that when all the methods are invoked, the invocation order would be ops.select_cpu(), runnable() and then enqueue(). So, this change moves update_stat_for_enq() invocation before calc_when_to_run() for put_global_rq(). update_stat_for_enq() updates taskc->load_actual which is consumed by calc_greedy_ratio() and thus affects calc_when_to_run(). Before this patch, calc_greedy_ratio() would use load_actual which doesn't reflect the last running period. After this patch, the latest running period will be reflected when the task gets queued to the global queue. The difference is unlikely to matter but it'd probably make sense to make it more consistent (e.g. do it at the end of quiescent transition). After this change, transit_task_stat() doesn't detect any invalid transitions.	2024-03-26 12:23:19 -10:00
Tejun Heo	625bb84bc4	scx_lavd: Move load subtraction to quiescent state transition scx_lavd tracks task state transitions and updates statistics on each valid transition. However, there's an asymmetry between the runnable/running and stopping/quiescent transitions. In the former, the runnable and running transitions are accounted separately in update_stat_for_enq() and update_stat_for_run(), respectively. However, in the latter, the two transitions are combined together in update_stat_for_stop(). This asymmetry leads to incorrect accounting. For example, a task's load should be added to the cpu's load sum when the task gets enqueued and subtracted when the task is no longer runnable (quiescent). The former is accounted correctly from update_stat_for_enq() but the latter is done whenever the task stops. A task can transit between running and stopping multiple times before becoming quiescent, so the asymmetry can end up subtracting the load of a task which is still running from the cpu's load sum. This patch: - introduces LAVD_TASK_STAT_QUIESCENT and updates transit_task_stat() so that it can handle all valid state transitions including the multiple back and forth transitions between two pairs - QUIESCENT <-> ENQ and RUNNING <-> STOPPING. - restores the symmetry by moving load adjustments part from update_stat_for_stop() to new update_stat_for_quiescent(). This removes a good chunk of ignored transitions. The next patch will take care of the rest.	2024-03-26 12:23:19 -10:00
Tejun Heo	dd40377f03	scx_lavd: Drop unnecessary `extern crate`s Since https://doc.rust-lang.org/edition-guide/rust-2018/path-changes.html, extern crate declarations aren't necessary. Let's drop them.	2024-03-26 12:23:19 -10:00
David Vernet	602ec5ada3	layered: Make helper functions static lookup_task_ctx(), lookup_task_ctx_may_fail(), and lookup_layer() currently don't have the static keyword, so BPF may treat them as a global function. We don't actually want these to be global, so let's make them static to avoid confusing the verifier. Signed-off-by: David Vernet <void@manifault.com>	2024-03-26 15:08:32 -05:00

1 2 3 4 5 ...

339 Commits