Commit Graph

159 Commits

Author SHA1 Message Date
Changwoo Min
22d4b13e8e scx_lavd: classify CPUs into BIG and little ones based on their average capacity
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 19:00:35 +09:00
Changwoo Min
0ad2f30fa8
Merge pull request #460 from multics69/lavd-misc
scx_lavd: misc updates
2024-07-31 08:55:04 +09:00
Changwoo Min
9b455cf010
Merge pull request #458 from sched-ext/lavd-fix-cpu-ctx-size
scx_lavd: set correct size for cpu_ctx_stor
2024-07-31 00:39:13 +09:00
Changwoo Min
6136cbee65 scx_lavd: tuning the time slice and preemption margins
Tuning the time slice under high load and change the kick/tick margins
for preemption more conservative. Especially, aggressive IPI-based
preemption (kick) causes performance unstability.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:30:59 +09:00
Changwoo Min
35b0d9f3c2 scx_lavd: improve starvation factor equation
Instead of using coarse-grained log(), let's directly use the ratio of
task's service time. Also, the virtual dealine equation is also updated
to reflect this change.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:27:17 +09:00
Changwoo Min
f9657a549f scx_lavd: fix bpf verification error in old kernel versions
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:22:43 +09:00
Changwoo Min
d2615b4975 scx_lavd: fix warnings from the rust code
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-31 00:21:32 +09:00
Andrea Righi
2015faa745 scx_lavd: set correct size for cpu_ctx_stor
The max_entries parameter in BPF_MAP_TYPE_PERCPU_ARRAY defines the
number of values per CPU and for cpu_ctx_stor we only need one item: the
CPU context.

Set max_entries to 1 to avoid allocating unnecessary memory and slightly
reduce the memory footprint.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-30 09:32:55 +02:00
Changwoo Min
643edb5431
Merge pull request #457 from multics69/lavd-amp-v2
scx_lavd: support two-level scheduling for heavy-loaded cases (like bpfland)
2024-07-30 10:39:06 +09:00
Changwoo Min
b91c1e4759 scx_lavd: add more comments on no_2_level_scheduling implementation
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-29 12:22:28 +09:00
Changwoo Min
f71fff9bbe scx_lavd: print a warning message when system does not provide a proper freq info
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:53:02 +09:00
Changwoo Min
4449d8e31c scx_lavd: incorporate a task's static priority in calculating its latency criticality
That's because static (nice) priority is a strong hint to distinguish
latency-critical tasks.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:41:43 +09:00
Changwoo Min
221f1fe12a scx_lavd: further prioritize producers over consumers
That is because many latency-critical tasks are producers.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:38:54 +09:00
Changwoo Min
7106e8cdca scx_lavd: support two-level scheduling for heavy-loaded cases
We introduce two-level scheduling similar to scx_bpfland. The two-level
scheduling consists of two DSQs: 1) latency-critical run queue and 2)
regular run queue. The scheduler prioritizes scheduling tasks on the
latency-critical queue but makes its best effort to schedule tasks on
the regular queue. The scheduler could be more resilient under heavy
load by segregating regular, non-latency-critical tasks from
latency-critical tasks.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:33:17 +09:00
Changwoo Min
9236c3e57c scx_lavd: increase the targeted latency for heavy loaded cases
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 15:30:01 +09:00
Changwoo Min
230512208d scx_lavd: fix div by zero error in some installations
The max frequency information from topology (from sysfs) seems not
always true. In some installations, it returns zero for all CPUs. In
this case, let's just consider all CPUs have the same capacity (1024),
hoping the kernel can give more preceise information.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 12:47:00 +09:00
Changwoo Min
59e54f4972 scx_lavd: print how to disable logging
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-28 12:31:51 +09:00
Changwoo Min
df1108ec6c scx_lavd: segregate starvation factor from the latency criticality (refactoring)
Latency criticality is a task's inherent property, but the starvation
factor is its dynamic status for the urgency of scheduling. Hence, we
segregate the starvation factor out. Also, cleaned up unnecessary
arguments and struct fields related.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-27 17:25:39 +09:00
Changwoo Min
eeea847697 scx_lavd: adjust time slice based on CPU's capacity
When a task is running on more performant core, the scheduler will give
a longer time slice. On the other hand, on a less performant core, a
shorter time slice will be assigned. The longer time slice helps
boosting clock frequency on a performant core. Also, the shorter time
slice gives more chance the performant core being utilized.

Regarding the CPU capacity, we first check if kernel-provided capacitiy values
are trustworthy or not. If not (i.e., all the same values), we rely on
the user-provided value, based on each CPU's maximum clock frequency.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-26 18:46:21 +09:00
Changwoo Min
e7b6ed1838 scx_lavd: add --prefer-smt-core option
With the --prefer-smt-core option is on, the core compaction prefers to
 utilizae hyper-twin first before utilizing the other physical CPUs. By
 default, the option is off.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-26 18:46:21 +09:00
Changwoo Min
19e337cd9b scx_lavd: make the core compaction AMP-aware
Previously, the core compaction assumed that each core's capacity was
the same. Now, we additionally consider each core's max clock frequency.
So, it always tries to use the higher-frequency cores first.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-26 18:46:21 +09:00
Changwoo Min
dbb3957eb1 scx_lavd: add a missing no_freq_scaling option check
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-26 18:46:21 +09:00
Changwoo Min
90b57a3fd7 scx_lavd: put a pinned kernel task to an overflow set
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-26 18:46:21 +09:00
Changwoo Min
e76bf999df scx_lavd: clean up constants (no functional changes)
Remove unused constants and rename outdated constants to proper names
(LAVD_TC_* to LAVC_CC_* and LAVD_ELIGIBLE_DSQ to LAVD_GLOBAL_DSQ).

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-26 18:46:21 +09:00
Changwoo Min
a9aab6b229 scx_lavd: fix typo
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-21 17:58:44 +09:00
Changwoo Min
add96f0e18 scx_lavd: do not maintain ineligible runnable tasks separately
With all the other optimizations and tunings, it turns out that maintaining
two runqueues has more harm than good.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 17:49:12 +09:00
Changwoo Min
827187d213 scx_lavd: adjust ineligible duration according to task's lat_cri
Further depenalize above-average latency-critical tasks and penalize
further below-avergage latency-critical tasks in ineligibility duration.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 17:37:27 +09:00
Changwoo Min
c653622ed9 scx_lavd: add LAVD_VDL_LOOSENESS_FT in calculating virtual deadline
LAVD_VDL_LOOSENESS_FT represents how loose the deadline is. The smaller
value means the deadline is tighter. While it is unlikely to be tuned,
let's keep it as a tunable for now.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 12:00:50 +09:00
Changwoo Min
e94070d5ca scx_lavd: remove LAVD_BOOST_*
These are no longer necessary after directly using latency criticality.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 11:53:20 +09:00
Changwoo Min
43f0fcb87c scx_lavd: removed unused LAVD_LOAD_FACTOR_*
These are no longer necessary after remnoving load factor calculation.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 11:51:12 +09:00
Changwoo Min
3924ebaa4d scx_lavd: properly synchronize taskc->vdeadline_log_clk
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 01:41:29 +09:00
Changwoo Min
02ad43d116 scx_lavd: directly use p->scx.weight instead load_ideal
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 00:25:11 +09:00
Changwoo Min
c955caefd8 scx_lavd: drop sys_load_factor
In theory, sys_load_factor should not be necessary since we do not
stretch the time space anymore.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 00:10:29 +09:00
Changwoo Min
67a6deb983 scx_lavd: use lat_cri instead of lat_prio universally
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 23:56:51 +09:00
Changwoo Min
6f10d6907c scx_lavd: drop sched_prio_to_slice_weight[] table
Use p->scx.weight instead.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 22:39:01 +09:00
Changwoo Min
034303f00f scx_lavd: consider starvation factor in determining latency criticality
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 22:17:50 +09:00
Changwoo Min
99e0d21c3c scx_lavd: drop the runtime factor in calculating latency criticality
That is okay since the runtime is considered in calculating a virtual
deadline. A shorter runtime will result in a tighter deadline linearly.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 17:28:40 +09:00
Changwoo Min
b90599e967 scx_lavd: do not inherit parent's properties
If inheriting the parent's properties, a new fork task tends to be too
prioritized. That is, many parent processes, such as `make,` are a bit
more latency-critical than average.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 15:29:13 +09:00
Changwoo Min
78d96a6fb6 scx_lavd: advance clock by reverse proportional to the system load
Advancing the clock slower when overloaded gives more opportunities for
latency-critical tasks to cut in the run queue. Controlling the clock
better reflects the actual load than the prior approach of stretching
the time-space when overloaded.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-18 15:53:38 +09:00
Changwoo Min
9bc20f9160 scx_lavd: maintain ineligible runnable tasks separately
We now maintain two run queues—an eligible run queue (DSQ) and an
ineligible run queue (rbtree)—sorted by the task's virtual deadline.
When the eligible run queue is empty, or the ineligible run queue has
not been consumed for too long (e.g., 15 msec), a task in the ineligible
run queue is moved to the eligible run queue for execution. With these
two queues, we have a better admission control.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-17 23:46:11 +09:00
Changwoo Min
55e19ea5df scx_lavd: do not prioritize a wake-up task in ops.select_cpu()
This is a prep for adding an ineligible DSQ.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-17 11:16:02 +09:00
Changwoo Min
c84b73e971 scx_lavd: rename LAVD_GLOBAL_DSQ to LAVD_ELIGIBLE_DSQ
This is a prep to add a global ineligible dsq.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-17 10:34:34 +09:00
Changwoo Min
971bb2e024 scx_lavd: pretty formatting for ineligible duration
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-16 23:54:15 +09:00
Changwoo Min
adfbf3934c scx_lavd: tuning the max ineligible duration
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-16 23:52:23 +09:00
Changwoo Min
eff444516f scx_lavd: directly measure service time for eligibility enforcement
Estimating the service time from run time and frequency is not
incorrect. However, it reacts slowly to sudden changes since it relies
on the moving average. Hence, we directly measure the service time to
enforce fairness.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-16 23:48:26 +09:00
Tejun Heo
51334b5c4d Bump versions for 1.0.1 release 2024-07-15 13:21:52 -10:00
Tejun Heo
3ae76acd12
Merge pull request #424 from sched-ext/sync-upstream-kernel-and-bump-to-1.0
Sync to upstream kernel and bump to 1.0
2024-07-14 07:00:38 -10:00
Tejun Heo
761ec142ce Bump most versions to 1.0.0
sched_ext is about to be merged upstream. There are some compatibility
breaking changes and we're making the current sched_ext/for-6.11
1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on
!wakeup enqueues") the baseline.

Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early
development and is currently temporarily disabled, only the patchlevel is
bumped.
2024-07-12 11:34:14 -10:00
Tejun Heo
f261d0f037 Sync from kernel - 1edab907b57d
Sync from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle
CPU for direct dispatch on !wakeup enqueues")

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.11

- cgroup support hasn't landed in the upstream kernel yet. This most likely
  will happen in a few weeks. For the time being, disable scx_flatcg,
  scx_pair and scx_mitosis.

- Compat macro for DSQ task iterator dropped. This is now a part of
  the baseline.

- scx_bpf_consume() isn't upstream yet. BPF interfacing side is still being
  discussed. Dropped example usage from tools/sched_ext. None of the
  practical schedulers use it, so this should be fine for now.

- scx_bpf_cpu_rq() added.

- AUTOATTACH workaround for newer libbpf versions added.
2024-07-12 11:08:41 -10:00
Changwoo Min
512bd143a5 scx_lavd: count only related tasks in calculating waker_freq
A task can become a runnable on any task's context not only its waker
task.  Thus, we should not count wake-up on unrelated task's context.
With this commit, the scheduler can (much more) accurately detect
waker-wakee relationsships.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-12 22:51:09 +09:00