Commit Graph

2208 Commits

Author SHA1 Message Date
Emil Tsalapatis
2b0909f9d7 scx_layered: point costc to global struct when initializing budgets 2024-11-04 11:40:07 -08:00
Tejun Heo
fae5396fd6
Merge pull request #886 from sirlucjan/sync-flags
scx: sync default flags with CachyOS Kernel Manager
2024-11-04 16:58:20 +00:00
Tejun Heo
7a912f0891
Merge pull request #889 from abrehman94/main
[bug-fix] fixes #810
2024-11-04 16:35:12 +00:00
Abdul Rehman
3b0b5e6f38 [bug-fix]
* increasing the number of bits for cpumask
2024-11-04 10:54:29 -05:00
Daniel Hodges
ee1df8572b
Merge pull request #888 from hodgesds/layered-cost-refactor
scx_layered: Refactor cost naming
2024-11-04 15:15:28 +00:00
Daniel Hodges
5dcdcfc50f scx_layered: Refactor cost naming
Refactor cost struct usage so that it always is used as `costc` for
clarity.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-11-04 07:07:13 -08:00
Changwoo Min
517ef89444 scx_lavd: drop all kernel lock tracing
The combination of kernel versions and kerenl configs generates
different kernel symbols. For example, in an old kernel version,
__mutex_lock() is not generated. Also, there is no workaround
from the fentry/fexit/kprobe side currently. Let's entirely drop
the kernel locking for now and revisit it later.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-04 21:36:32 +09:00
Changwoo Min
796d324555 scx_lavd: improve readabilty
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-04 21:18:43 +09:00
Piotr Gorski
7efe286af0
scx: sync default flags with CachyOS Kernel Manager
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-11-04 11:03:16 +01:00
Changwoo Min
060a1662fa
Merge pull request #885 from multics69/ci-fix-v2
scx_lavd: fix CI error for missing kernel symbols
2024-11-04 09:31:34 +00:00
Changwoo Min
882212574a scx_lavd: fix a variable name to kill a warning
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-04 16:20:53 +09:00
Changwoo Min
e1b880f7c3 scx_lavd: fix CI error for missing kernel symbols
Revised the lock tracking code, relying on stable symbols with various
kernel configurations. There are two changes:

- Entirely drop tracing rt_mutex, which can be on and off with kconfig

- Replace mutex_lock() families to __mutex_lock(), which is stable
  across kernel configs. The downside of such change is it is now
  possible to trace the lock fast path, so lock tracing is a bit less
  accurate. But let's live with it for now until a better solution is found.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-04 16:05:53 +09:00
Andrea Righi
20c0372624
Merge pull request #871 from sched-ext/bpfland-minor-cleanups
scx_bpfland: minor cleanups
2024-11-02 16:20:03 +01:00
Andrea Righi
f6f5481081
Merge branch 'main' into bpfland-minor-cleanups 2024-11-02 15:28:37 +01:00
Daniel Hodges
ade6c6ab41
Merge pull request #875 from hodgesds/layered-format-fixes
scx_layered Fix trace format
2024-11-01 17:17:45 -04:00
Daniel Hodges
7e0f2cd3f3 scx_layered Fix trace format
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-11-01 10:16:14 -07:00
Daniel Hodges
9bba51a485
Merge pull request #874 from hodgesds/layered-fallback-stall
scx_layered: Add additional drain to fallback DSQs
2024-11-01 10:43:55 -04:00
Daniel Hodges
cb5b1961b8
Merge branch 'main' into layered-fallback-stall 2024-11-01 10:24:00 -04:00
Daniel Hodges
defbf385cd
Merge pull request #873 from hodgesds/layered-dump-fixes
scx_layered: Fix dump output format
2024-11-01 10:23:24 -04:00
Daniel Hodges
0a518e9f9e scx_layered: Add additional drain to fallback DSQs
Fallback DSQs are not accounted with costs. If a layer is saturating the
machine it is possible to not consume from the fallback DSQ and stall
the task. This introduces and additional consumption from the fallback
DSQ when a layer runs out of budget. In addition, tasks that use partial
CPU affinities should be placed into the fallback DSQ. This change was
tested with stress-ng --cacheline `nproc` for several minutes without
causing stalls (which would stall on main).

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-11-01 06:04:55 -07:00
Daniel Hodges
a8109d3341 scx_layered: Fix dump output format
Flip the order of layer id vs layer name so that the output makes sense.
Example output:

LO_FALLBACK nr_queued=0 -0ms
COST GLOBAL[0][random] budget=22000000000 capacity=22000000000
COST GLOBAL[1][hodgesd] budget=0 capacity=0
COST GLOBAL[2][stress-ng] budget=0 capacity=0
COST GLOBAL[3][normal] budget=0 capacity=0
COST CPU[0][0][random] budget=62500000000000 capacity=62500000000000
COST CPU[0][1][random] budget=100000000000000 capacity=100000000000000
COST CPU[0][2][random] budget=124911500964411 capacity=125000000000000

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-11-01 04:06:33 -07:00
Changwoo Min
cf6aa5d63b
Merge pull request #872 from multics69/lavd-opt-dispatch-v2
scx_lavd: optimize ops.dispatch() and DSQ placement
2024-11-01 19:52:58 +09:00
Changwoo Min
18a80977bb scx_lavd: create DSQs on their associated NUMA nodes
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-01 11:18:22 +09:00
Changwoo Min
ce31d3c59e scx_lavd: optimize consume_starving_task()
Loop until the total number of DSQs not the theoretical maximum of DSQs.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-01 10:43:34 +09:00
Changwoo Min
673f80d3f7 scx_lavd: fix warnings
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-11-01 10:13:28 +09:00
Changwoo Min
4c3f1fd61c
Merge pull request #867 from vax-r/lavd_typo
scx_lavd: Fix typos
2024-11-01 09:40:50 +09:00
Changwoo Min
93ec656916
Merge pull request #866 from vax-r/lavd_fix_type
scx_lavd: Correct the type of taskc within lavd_dispatch()
2024-11-01 09:39:58 +09:00
Andrea Righi
628605cdee scx_bpfland: get rid of the global dynamic nvcsw threshold
The dynamic nvcsw threshold is not used anymore in the scheduler and it
doesn't make sense to report it in the scheduler's statistics, so let's
just drop it.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
2024-10-31 21:48:44 +01:00
Andrea Righi
827f6c6147 scx_bpfland: get rid of MAX_LATENCY_WEIGHT
Get rid of the static MAX_LATENCY_WEIGHT and always rely on the value
specified by --nvcsw-max-thresh.

This allows to tune the maximum latency weight when running in
lowlatency mode (via --nvcsw-max-thresh) and it also restores the
maximum nvcsw limit in non-lowlatency mode, that was incorrectly changed
during the lowlatency refactoring.

Fixes: 4d68133 ("scx_bpfland: rework lowlatency mode to adjust tasks priority")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
2024-10-31 21:48:44 +01:00
Andrea Righi
72e9451c4a scx_bpfland: evaluate nvcsw without using kernel metrics
Evalute the amount of voluntary context switches directly in the BPF
code, without relying on the kernel p->nvcsw metric.

Signed-off-by: Andrea Righi <arighi@nvidia.com>
2024-10-31 21:48:44 +01:00
Daniel Hodges
9a47677904
Merge pull request #868 from hodgesds/layered-cost-dump
scx_layered: Add layer CPU cost to dump
2024-10-31 14:57:35 -04:00
Daniel Hodges
6839a84926 scx_layered: Add layer CPU cost to dump
Add the layer CPU cost when dumping. This is useful for understanding
the per layer cost accounting when layered is stalled.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-31 10:45:33 -07:00
Daniel Hodges
ea4013e2d4
Merge pull request #865 from hodgesds/layered-dump-cleanup
scx_layered: Add layer name to bpf
2024-10-31 09:59:20 -04:00
Daniel Hodges
9e42480a62 scx_layered: Add layer name to bpf
Add the layer name to the bpf representation of a layer. When printing
debug output print the layer name as well as the layer index.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-31 05:12:12 -07:00
I Hsin Cheng
3eed58cd26 scx_lavd: Fix typos
Fix "infomation" to "information"

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-10-31 18:55:55 +08:00
I Hsin Cheng
f55cc965ac scx_lavd: Correct the type of taskc within lavd_dispatch()
The type of "taskc" within "lavd_dispatch()" was "struct task_struct *",
while it should be "struct task_ctx *".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-10-31 18:37:41 +08:00
Changwoo Min
83b5f4eb23
Merge pull request #861 from multics69/lavd-opt-dispatch
scx_lavd: tuning and optimizing latency criticality calculation
2024-10-31 08:10:21 +09:00
Daniel Hodges
d56650c2e9
Merge pull request #862 from hodgesds/layered-dispatch-refactor
scx_layered: Refactor dispatch
2024-10-30 15:38:30 +00:00
Daniel Hodges
a8d245b164 scx_layered: Refactor dispatch
Refactor dispatch to use a separate set of global helpers for topo aware
dispatch. This change only refactors dispatch to make it more
maintainable, without any functional changes.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-10-30 07:27:05 -07:00
Changwoo Min
82b25a94f4 scx_lavd: boost task's latency criticality when pinned to a cpu
Pinning a task to a single CPU is a widely-used optimization to
improve latency by reusing cache. So when a task is pinned to
a single CPU, let's boost its latency criticality.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-30 15:28:18 +09:00
Changwoo Min
fe5554b83d scx_lavd: move reset_lock_futex_boost() to ops.running()
Resetting reset_lock_futex_boost() at ops.enqueue() is not accurate,
so move it to the running. This way, we can prevent the lock holder
preemption only when a lock is acquired during ops.runnging() and
ops.stopping().

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-30 13:52:52 +09:00
Changwoo Min
0f58a9cd39 scx_lavd: calculate task's latency criticality in the direct dispatch path
Even in the direct dispatch path, calculating the task's latency
criticality is still necessary since the latency criticality is
used for the preemptablity test. This addressed the following
GitHub issue:

https://github.com/sched-ext/scx/issues/856

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-30 12:58:06 +09:00
Changwoo Min
3dcaefcb2f
Merge pull request #854 from multics69/lavd-preempt
scx_lavd: optimize preemption
2024-10-29 00:32:12 +00:00
Daniel Hodges
7c5e2355ba
Merge pull request #855 from hodgesds/layered-wall-token
scx_layered: Add cost accounting
2024-10-28 20:22:34 +00:00
Daniel Hodges
ad1bfee885 scx_layered: Add cost accounting
Add cost accounting for layers to make weights work on the BPF side.
This is done at both the CPU level as well as globally. When a CPU
runs out of budget it acquires budget from the global context. If a
layer runs out of global budgets then all budgets are reset. Weight
handling is done by iterating over layers by their available budget.
Layers budgets are proportional to their weights.
2024-10-28 13:09:04 -07:00
Changwoo Min
5b91a525bb scx_lavd: kick CPU explicitly at the ops.enqueue() path
When the current task is decided to yield, we should explicitly call
scx_bpf_kick_cpu(_, SCX_KICK_PREEMPT). Setting the current task's time
slice to zero is not sufficient in this because the sched_ext core
does not call resched_curr() at the ops.enqueue() path.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-28 17:31:34 +09:00
Changwoo Min
f56b79b19c scx_lavd: yield for preemption only when a task is ineligible
An eligible task is unlikely preemptible. In other words, an ineligible
task is more likely preemptible since its greedy ratio penalty in virtual
deadline calculation. Hence, we skip the predictability test
for an eligible task.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-10-28 12:45:50 +09:00
David Vernet
6e35208e4b
Merge pull request #853 from sched-ext/ci-enable-lockdep
ci: enable lockdep in the virtme-ng kernel config
2024-10-26 12:12:36 +00:00
Andrea Righi
addae505df ci: enable lockdep in the virtme-ng kernel config
Set CONFIG_PROVE_LOCKING=y in the defalut virtme-ng config, to enable
lockdep and additional lock debugging features, which can help catch
lock-related kernel issues in advance.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-10-26 13:57:37 +02:00
a618ebf9bc
Merge pull request #852 from JakeHillion/pr852
layered/timers: support verifying on older kernels and fix logic
2024-10-25 10:36:17 +00:00