Commit Graph

1429 Commits

Author SHA1 Message Date
Changwoo Min
3924ebaa4d scx_lavd: properly synchronize taskc->vdeadline_log_clk
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 01:41:29 +09:00
Tejun Heo
2e22a53560
Merge pull request #441 from hodgesds/layered-metrics-refactor
scx_layered: Add separate module for metrics
2024-07-19 06:37:51 -10:00
Changwoo Min
02ad43d116 scx_lavd: directly use p->scx.weight instead load_ideal
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 00:25:11 +09:00
Changwoo Min
c955caefd8 scx_lavd: drop sys_load_factor
In theory, sys_load_factor should not be necessary since we do not
stretch the time space anymore.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-20 00:10:29 +09:00
Changwoo Min
67a6deb983 scx_lavd: use lat_cri instead of lat_prio universally
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 23:56:51 +09:00
Daniel Hodges
b98a9f56a8 scx_layered: Add separate module for metrics
Refactor the main module for scx_layered to move metrics into a separate
module. This change does no functional differences, only code structure.
This will make it a little easier to navigate the logic in the main
scheduler code.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-19 07:40:24 -07:00
Changwoo Min
6f10d6907c scx_lavd: drop sched_prio_to_slice_weight[] table
Use p->scx.weight instead.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 22:39:01 +09:00
Changwoo Min
034303f00f scx_lavd: consider starvation factor in determining latency criticality
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 22:17:50 +09:00
Daniel Hodges
d974690b5d
Merge pull request #435 from vax-r/remove_skip_while
scx_rusty: Remove skip_while in find_first_candidate
2024-07-19 08:38:58 -04:00
Changwoo Min
99e0d21c3c scx_lavd: drop the runtime factor in calculating latency criticality
That is okay since the runtime is considered in calculating a virtual
deadline. A shorter runtime will result in a tighter deadline linearly.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 17:28:40 +09:00
Changwoo Min
b90599e967 scx_lavd: do not inherit parent's properties
If inheriting the parent's properties, a new fork task tends to be too
prioritized. That is, many parent processes, such as `make,` are a bit
more latency-critical than average.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-19 15:29:13 +09:00
Andrea Righi
cb86768ed9
Merge pull request #439 from sched-ext/bpfland-dynamic-nvcsw-thresh
scx_bpfland: introduce dynamic nvcsw threshold
2024-07-18 20:38:23 +02:00
Andrea Righi
c4eb3ce7b4 scx_bpfland: introduce dynamic nvcsw threshold
Instead of using a static value to classify tasks based on their average
amount of voluntary context switches, try to periodically evaluate an
optimal threshold, based on a global average of voluntary context
switches among of all the running tasks.

Tasks with an average amount of voluntary context switches greater than
the global average will be classified as interactive.

The global average is evaluated as an exponentially weighted moving
average (EWMA), as:

  avg(t) = avg(t - 1) * 0.75 - task_avg(t) * 0.25

This approach is more efficient than iterating through all tasks and it
helps to prevent rapid fluctuations that may be caused by bursts of
voluntary context switch events.

The dynamic nvcsw threshold enables a more precise adjustment of
the classification criteria to swiftly respond to global system changes:
tasks can be quickly classified as interactive, but if the system
experiences too many interactive events, the criteria for maintaining
interactive status become stricter. This creates a natural selection
process where only the most deserving tasks remain interactive.

Additionally, introduce the new option `--nvcsw-max-thresh N`, which
allows to extend or restrict the fluctuation range of the global average
threshold for voluntary context switches.

Tested-by: Piotr Gorski <piotrgorski@cachyos.org>
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-18 19:03:25 +02:00
Changwoo Min
78d96a6fb6 scx_lavd: advance clock by reverse proportional to the system load
Advancing the clock slower when overloaded gives more opportunities for
latency-critical tasks to cut in the run queue. Controlling the clock
better reflects the actual load than the prior approach of stretching
the time-space when overloaded.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-18 15:53:38 +09:00
Changwoo Min
9bc20f9160 scx_lavd: maintain ineligible runnable tasks separately
We now maintain two run queues—an eligible run queue (DSQ) and an
ineligible run queue (rbtree)—sorted by the task's virtual deadline.
When the eligible run queue is empty, or the ineligible run queue has
not been consumed for too long (e.g., 15 msec), a task in the ineligible
run queue is moved to the eligible run queue for execution. With these
two queues, we have a better admission control.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-17 23:46:11 +09:00
I Hsin Cheng
2525b94af4 scx_rusty: Remove unused variable
Remove unused variable "has_preferred_dom".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-07-17 20:30:17 +08:00
I Hsin Cheng
bf2f0fbf35 scx_rusty: Remove skip_while in find_first_candidate
Followed commit 1c3b563, move the checking of task.migrated.get() into
the vector filter. In this way, we can remove the skip_while() call in
find_first_candidate().

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-07-17 20:27:12 +08:00
Changwoo Min
55e19ea5df scx_lavd: do not prioritize a wake-up task in ops.select_cpu()
This is a prep for adding an ineligible DSQ.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-17 11:16:02 +09:00
Changwoo Min
c84b73e971 scx_lavd: rename LAVD_GLOBAL_DSQ to LAVD_ELIGIBLE_DSQ
This is a prep to add a global ineligible dsq.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-17 10:34:34 +09:00
Daniel Müller
565aec3662 rust: Update libbpf-rs & libbpf-cargo to 0.24
Update libbpf-rs & libbpf-cargo to 0.24. Among other things, generated
skeletons now contain directly accessible map and program objects, no
longer necessitating the use of accessor methods. As a result, the risk
for mutability conflicts is reduced greatly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2024-07-16 11:48:52 -07:00
David Vernet
e4be1ec9fe
Merge pull request #364 from hodgesds/rusty-mempolicy
scx_rusty: Add mempolicy checks to rusty
2024-07-16 13:40:37 -05:00
Daniel Hodges
27122a8a00 scx_rusty: refactor mempolicy handling bpf code and load balancing
This change refactors some of the helper methods for getting the
preferred node for tasks using mempolicy. The load balancing logic in
try_find_move_task is updated to allow for a filter, which is used to
filter for tasks with a preferred mempolicy.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-16 09:40:00 -07:00
Tejun Heo
47ee332196
Merge pull request #434 from hodgesds/cpu-topo-cache-id
scx_utils: Add LLC id to CPU
2024-07-16 06:20:17 -10:00
Daniel Hodges
43a263aa75 scx_rusty: Use preferred node mask with balancer
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-16 08:11:19 -07:00
Daniel Hodges
bab6e9523c scx_rusty: Add mempolicy checks to rusty
This change makes scx_rusty mempolicy aware. When a process uses
set_mempolicy it can change NUMA memory preferences and cause
performance issues when tasks are scheduled on remote NUMA nodes. This
change modifies task_pick_domain to use the new helper method that
returns the preferred node id.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-16 08:11:19 -07:00
Changwoo Min
971bb2e024 scx_lavd: pretty formatting for ineligible duration
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-16 23:54:15 +09:00
Changwoo Min
adfbf3934c scx_lavd: tuning the max ineligible duration
Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-16 23:52:23 +09:00
Changwoo Min
eff444516f scx_lavd: directly measure service time for eligibility enforcement
Estimating the service time from run time and frequency is not
incorrect. However, it reacts slowly to sudden changes since it relies
on the moving average. Hence, we directly measure the service time to
enforce fairness.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-07-16 23:48:26 +09:00
Daniel Hodges
97fb0fa0c8 scx_utils: Add LLC id to CPU
Add the LLC id to the Cpu struct, which will make it easier for
referencing the LLC when doing operations on a Cpu.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-16 07:46:55 -07:00
David Vernet
a4d9bcd453
Merge pull request #425 from vax-r/dom_mask_precheck
scx_rusty: Pre-check task domain mask with pull domain mask
2024-07-16 09:30:27 -05:00
I Hsin Cheng
1c3b563caf scx_rusty: Pre-check task domain mask with pull domain mask
Instead of performing domain mask checking inside
"find_first_candidate()" every time, check whether the tasks within push
domain are abled to run on pull domain by performing the mask check at
vector generation stage.

This way can also avoid repeated computation generated by the same
(task, pull_dom) pair as they'll try to check whether the pull domain is
in the task domain mask.

Also since whether a task is a kworker won't change in time, we can
perform the check earlier and put it in the filter, too.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-07-16 21:48:06 +08:00
Tejun Heo
757cf6ff54
Merge pull request #433 from sched-ext/htejun/bump-versions
Bump versions for 1.0.1 release
2024-07-15 13:33:33 -10:00
Tejun Heo
51334b5c4d Bump versions for 1.0.1 release 2024-07-15 13:21:52 -10:00
Tejun Heo
99afc66251
Merge pull request #432 from jfernandez/fix-sub-counters
scx_utils: Display the tag value for nested counters in log_recorder
2024-07-15 12:14:49 -10:00
Jose Fernandez
a63bf77d68
scx_utils: Display the tag value for nested counters in log_recorder
The log_recorder was incorrectly displaying the tag name instead of the
value for nested counters.

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-07-15 15:20:44 -06:00
Andrea Righi
e268c589ca
Merge pull request #429 from sched-ext/bpfland-small-improvements
scx_bpfland: small improvements
2024-07-15 20:08:46 +02:00
Andrea Righi
8e7a526356 scx_bpfland: use nr_cpu_ids for consistency
We always use nr_cpu_ids to represent the maximum CPU id returned by
scx_bpf_nr_cpu_ids().

Replace cpu_max with nr_cpu_ids to be more consistent with the rest of
the code.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 08:44:35 +02:00
Tejun Heo
8aae6917e6
Merge pull request #431 from sched-ext/ci-add-kernel-config
ci: Use latest upstream kernel and exclude scx_flatcg and scx_pair from testing
2024-07-14 19:40:55 -10:00
Andrea Righi
f72e7567bf ci: include kernel config chunk
The kernel from kernel.org lacks the necessary .config chunk required
by the build via virtme-ng.

To fix this, include the kernel .config chunk directly in the scx
repository and use it from here.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 07:10:07 +02:00
Tejun Heo
759be0e406 ci: Use latest upstream kernel and exclude scx_flatcg and scx_pair from testing 2024-07-14 13:27:51 -10:00
Tejun Heo
e580ddec3c
Merge pull request #427 from ptr1337/fix-packaging
install_user_scheds: Skip packaging of scx_mitosis
2024-07-14 13:19:49 -10:00
Tejun Heo
4141082a26
Merge pull request #428 from ptr1337/systemd
systemd: Drop temporarily disabled schedulers from service
2024-07-14 13:16:26 -10:00
Andrea Righi
33d06f653b scx_bpfland: get rid of the MAX_CPUS hard-coded limit
We can rely on scx_bpf_nr_cpu_ids() to create all the possible per-CPU
DSQs, eliminating the need for the hard-coded limit MAX_CPUS.

In this way scx_bpfland can support the same amount of CPUs that the
kernel can handle.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 00:17:30 +02:00
Andrea Righi
b80ef7d8eb scx_bpfland: optimize offline CPU handling
Instead of constantly checking the need to drain tasks from the DSQs of
the offline CPUs, provide an atomic flag to notify when there are tasks
to be drained from the offline CPUs.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 00:17:23 +02:00
Andrea Righi
0530706710 scx_bpfland: properly initialize the nvcsw metrics
Initialize the number of voluntary context switches metrics in the local
task storage.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 00:16:10 +02:00
Andrea Righi
bf4ad23599 scx_bpfland: refine interactive tasks flood safeguard
Refine the safeguard mechanism to avoid generating too many interactive
tasks in the system, which could nullify the effect of the
interactive/regular task classification.

The safeguard mechanism operates by pausing the promotion of new tasks
to interactive status during the task wake-up process, whenever the
number of interactive tasks in the priority queue exceeds a specific
limit (set to 4x the number of online CPUs).

Halting the promotion of additional interactive tasks allows to
prioritize those already classified as interactive, thereby preventing
potential "bursts" of excessive interactive tasks in the system.

This refines the mitigation already provided by commit 640bd562
("scx_bpfland: prevent tasks from abusing interactive priority boost").

Fixes: 640bd562 ("scx_bpfland: prevent tasks from abusing interactive priority boost")
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-15 00:11:34 +02:00
Andrea Righi
eb1cf0e670 scx_bpfland: improve task time slice evaluation
Always assign the maximum time slice if there are idle CPUs in the
system.

Otherwise, double the task's unused time slice to reward tasks that use
less CPU time and at the same time refill the time slice of the tasks
every time they're dispatched.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
2024-07-14 23:24:24 +02:00
Peter Jung
9e2caa74c0
systemd: Drop temporarily disabled schedulers from service
Signed-off-by: Peter Jung <admin@ptr1337.dev>
2024-07-14 20:26:04 +02:00
Peter Jung
a7fa651bfc
install_user_scheds: Skip packaging of scx_mitosis
Signed-off-by: Peter Jung <admin@ptr1337.dev>
2024-07-14 20:21:14 +02:00
Tejun Heo
3ae76acd12
Merge pull request #424 from sched-ext/sync-upstream-kernel-and-bump-to-1.0
Sync to upstream kernel and bump to 1.0
2024-07-14 07:00:38 -10:00