Commit Graph

109 Commits

Author SHA1 Message Date
Tejun Heo
1912e05f0b
Merge pull request #499 from sched-ext/htejun/scx_stats
scx_stats: Misc changes to sync dep versions and publish on crates.io
2024-08-15 12:32:44 -10:00
Tejun Heo
0b9c8b5cbd scx_stats: Update versions to 0.2.0 to republish 2024-08-15 12:29:27 -10:00
Daniel Hodges
0319afc88e scx_layered: Update nr_cpus when resizing layers
After updating scx_layered to be topology aware the nr_cpus field on the
layer was not being updated properly. Update layer growing/shrinking
logic to correctly update the nr_cpus count.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-15 13:22:26 -07:00
Tejun Heo
b614cf848f scx_layered: Make monitor time based iterations dumber
This makes ctrl-c a bit more responsive without complicating code.
2024-08-15 09:23:29 -10:00
Tejun Heo
45fb724ee2 scx_layered: Restore cpumask reporting 2024-08-15 09:12:29 -10:00
Tejun Heo
751a38e34e scx_layered: Refactor stats printing code 2024-08-15 08:53:19 -10:00
Tejun Heo
a4f424056e scx_layered: Move stats server launching to stats.rs 2024-08-15 06:30:42 -10:00
Tejun Heo
17afc72479 scx_stats: Rename cleanups
- s/stat/stats/ on several stragglers.

- Rename traits so that they are more distinctive from struct and other
  names and follow the convention.
2024-08-15 06:24:56 -10:00
Tejun Heo
a091d5ea7d scx_layered: s/monitor.rs/stats.rs/ and make stats refresh code struct ops 2024-08-15 06:13:05 -10:00
Tejun Heo
8aae9a5de2 scx_stats: s/scx_stat/scx_stats/
Use plural form which is more widespread and also used in scheduler
implementations. No functional changes.
2024-08-15 05:31:34 -10:00
Tejun Heo
6e466d18df scx_layered: Initial switch to scx_stat
- This makes the scheduler side simpler and allows on-demand monitoring.

- OpenMetrics support is dropped for now. Will add a generic tool for it.

- This is a naive conversion. Will be further refined.

scx_layered no longer prints statistics by default. To watch statistics, run
`scx_layered --monitor` while the scheduler is running.
2024-08-14 13:48:41 -10:00
Tejun Heo
7820ec9b46 scx_stat, scx_layered: cargo fmt 2024-08-14 11:47:37 -10:00
Daniel Hodges
646cefd46d
Merge pull request #477 from hodgesds/layered-global-match
scx_rusty: Make layer matching a global function
2024-08-12 09:14:58 -04:00
Daniel Hodges
be5213e129 scx_rusty: Make layer matching a global function
Layer matching currently takes a large number of bpf instructions.
Moving layer matching to a global function will reduce the overall
instruction count and allow for other layer matching methods such as
glob.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-12 05:44:34 -07:00
Tejun Heo
63c4a0191f
Merge branch 'main' into topic/inlined-skeleton-members 2024-08-08 14:23:37 -10:00
Tejun Heo
cd6a4d72c7 Bump versions for 1.0.2 release 2024-08-08 14:10:16 -10:00
Tejun Heo
7c3ffe96e1 Unify crate dependency versions
Different sub-projects are using different versions for the same crates.
Synchronize them to the latest.
2024-08-08 13:26:47 -10:00
Daniel Hodges
d5efcd3245 scx_layered: Fix cred declaration
The use of the cred struct should be const.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-06 05:22:12 -07:00
Daniel Hodges
1f922b9a73 scx_layered: Add support for disabling topology awareness
Add a parameter to disable topology awareness. This is useful when
trying to compare the scheduling performance of topology aware
scheduling compared to the previous scheduling strategy.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-08-02 08:07:19 -07:00
Daniel Hodges
de7b5fe190 scx_layered: Fix dispatch fallback CPU selection
When the previous CPU for a task is not known do not fall back to
dispatching to CPU 0, use the current CPU.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-31 12:35:22 -07:00
Daniel Hodges
4f12bebaa5 scx_layered: Add per cpu layer iterator offset
Add a per cpu counter offset to round robin when iterating on layers.
This is to make selection from different layers more fair.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-30 10:44:41 -07:00
Daniel Hodges
4c3fd6cd9b scx_layered: Rename UserId and GroupId
TLDR; rename UserId and GroupId to UIDEquals and GIDEquals.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-24 15:09:08 -07:00
Daniel Hodges
55f6d68eef scx_layered: Add user and group layers
Add a layer match based on either the effective user id or the effective
group id. This allows for creating layers for individual users or
groups.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-24 15:09:08 -07:00
Daniel Hodges
2803f9c127 scx_layered: Fix formatting issues
Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-24 14:39:02 -07:00
Daniel Hodges
0814abf0b8 scx_layered: Add node topology awareness
Add NUMA node topology awareness for scx_layared. This borrows some of
the NUMA handling from scx_rusty and allows layers to set a node mask.
Different layer kinds will use the node mask differently.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-24 09:53:48 -07:00
David Vernet
4f11e2abe2
layered: Don't dispatch to LO_FALLBACK_DSQ
Non-kthreads with custom affinities in non-open layers are dispatched into a
LO_FALLBACK_DSQ, with the idea being that they're penalized for their custom
affinities. When a host is fully utilized, these tasks can end up being starved
due to LO_FALLBACK_DSQ being consumed only when there are no other layers to
consume from. In internal workloads at Meta, we've observed that this can
happen in practice.

Longer term, we can probably address this by implementing layer weights and
applying that to fallback DSQs to avoid starvation. For now, let's just
dispatch them to HI_FALLBACK_DSQ to avoid this starvation issue.

Signed-off-by: David Vernet <void@manifault.com>
2024-07-19 19:14:18 -05:00
Daniel Hodges
b98a9f56a8 scx_layered: Add separate module for metrics
Refactor the main module for scx_layered to move metrics into a separate
module. This change does no functional differences, only code structure.
This will make it a little easier to navigate the logic in the main
scheduler code.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
2024-07-19 07:40:24 -07:00
Daniel Müller
565aec3662 rust: Update libbpf-rs & libbpf-cargo to 0.24
Update libbpf-rs & libbpf-cargo to 0.24. Among other things, generated
skeletons now contain directly accessible map and program objects, no
longer necessitating the use of accessor methods. As a result, the risk
for mutability conflicts is reduced greatly.

Signed-off-by: Daniel Müller <deso@posteo.net>
2024-07-16 11:48:52 -07:00
Tejun Heo
51334b5c4d Bump versions for 1.0.1 release 2024-07-15 13:21:52 -10:00
Tejun Heo
761ec142ce Bump most versions to 1.0.0
sched_ext is about to be merged upstream. There are some compatibility
breaking changes and we're making the current sched_ext/for-6.11
1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on
!wakeup enqueues") the baseline.

Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early
development and is currently temporarily disabled, only the patchlevel is
bumped.
2024-07-12 11:34:14 -10:00
Tejun Heo
f261d0f037 Sync from kernel - 1edab907b57d
Sync from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle
CPU for direct dispatch on !wakeup enqueues")

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.11

- cgroup support hasn't landed in the upstream kernel yet. This most likely
  will happen in a few weeks. For the time being, disable scx_flatcg,
  scx_pair and scx_mitosis.

- Compat macro for DSQ task iterator dropped. This is now a part of
  the baseline.

- scx_bpf_consume() isn't upstream yet. BPF interfacing side is still being
  discussed. Dropped example usage from tools/sched_ext. None of the
  practical schedulers use it, so this should be fine for now.

- scx_bpf_cpu_rq() added.

- AUTOATTACH workaround for newer libbpf versions added.
2024-07-12 11:08:41 -10:00
Andrea Righi
cf4883fbf8 meson: introduce serialize build option
With commit 5d20f89a ("scheds-rust: build rust schedulers in sequence"),
schedulers are now built serially one after the other to prevent meson
and cargo from forking NxN parallel tasks.

However, this change has made building a single scheduler much more
cumbersome, due to the chain of dependencies.

For example, building scx_rusty using the specific meson target would
still result in all schedulers being built, because they all depend on
each other.

To address this issue, introduce the new meson build option
`serialize=true|false` (default is false).

This option allows to disable the schedulers' build chain, restoring the
old behavior.

With this option enabled, it is now possible to build just a single
scheduler, parallelizing the cargo build properly, without triggering
the build of the others. Example:

  $ meson setup build -Dbuildtype=release -Dserialize=false
  $ meson compile -C build scx_rusty

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-28 10:17:37 +02:00
Tejun Heo
dde2942125 compat: Drop __COMPAT_scx_bpf_cpuperf_*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_*(). The open helper
macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.
2024-06-16 06:16:53 -10:00
Tejun Heo
13e8388e1e compat: Drop __COMPAT_HAS_CPUMASKS
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros
now check the existence of scx_bpf_nr_cpu_ids() and abort if not.
2024-06-16 06:12:06 -10:00
Tejun Heo
5b5e5be906 compat: Drop __COMPAT_SCX_KICK_IDLE
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros
now check the existence of SCX_KICK_IDLE and abort if not.
2024-06-15 20:24:15 -10:00
Tejun Heo
7c9aedaefe compat: Drop __COMPAT_scx_bpf_switch_all()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper
macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.
2024-06-15 20:03:37 -10:00
Tejun Heo
9ec3594b4f scx_layered: Several fixes to address David's review
- pick_idle_cpu() was putting idle_smtmask that it didn't acquire.

- layered_enqueue() was unnecessarily entering preemption path after finding
  an idle CPU.

- No need to test whether scx_bpf_get_idle_cpu/smtmask() return NULL. They
  never do.

- Relocate cctx->yielding test into keep_runinng() from its caller.
2024-06-10 11:23:37 -10:00
Tejun Heo
92317aa2f9 Use __always_inline uniformly
Instead of using __attribute__((always_inline)) use the __always_inline
macro provided by BPF.
2024-06-10 11:23:26 -10:00
Tejun Heo
a165970ab9 scx_layered: Add migration statistic
Keep track of how frequent migrations are.
2024-06-07 11:49:39 -10:00
Tejun Heo
5b31d96c3d scx_layered: Implement "preempt_first" layer property
If set, tasks in the layer will try to preempt tasks in their previous CPUs
before trying to find idle CPUs.
2024-06-07 11:49:39 -10:00
Tejun Heo
ece3638664 scx_layered: Allow confined layers to preempt
There's no reason to restrict confined layers from preempting on the CPUs
that they are entitled to. Allow preemption for confined layers.
2024-06-07 11:49:39 -10:00
Tejun Heo
7c48814ed0 scx_layered: Prefer preempting the CPU the task was previously on
Currently, when preempting, searching for the candidate CPU always starts
from the RR preemption cursor. Let's first try the previous CPU the
preempting task was on as that may have some locality benefits.
2024-06-07 11:49:38 -10:00
Tejun Heo
3db3257911 scx_layered: Find and kick an idle CPU from enqueue path
When a task is being enqueued outside wakeup path, ops.select_cpu() isn't
called, so we can end up in a situation where a newly enqueued task keeps
waiting in one of the DSQs while there are idle CPUs. Factor out idle CPU
selection path into pick_idle_cpu() and call it from the enqueue path in
such cases. This problem is shared across schedulers and likely needs a more
generic solution in the future.
2024-06-07 11:49:38 -10:00
Tejun Heo
0f2d1ad2fa scx_layered: Implement a new layer parameter "yield_ignore"
yield(2) currently gives up the entire slice. Add "yield_ignore" layer
parameter which can modulate the magnitude of yiedling. When 1.0, yields are
completely ignored. 0.5, only half worth of the full slice is given up and
so on.
2024-06-07 11:49:38 -10:00
Tejun Heo
4aa8124b9c scx_layered: Add explicit yield() support
Currently, a task which yields is treated the same as a task which has run
out its slice. As the budget charged to a task is calculated from wall clock
time, a repeatedly yielding task can stay at the top of the queue for quite
a while hogging the CPU and spiking the number of scheduling events.

Let's add explicit yield support. An yielding task is now always charged the
full slice and not allowed to keep running on the same CPU.
2024-06-07 11:49:38 -10:00
Tejun Heo
436cd7ba9e scx_layered: Make enqueue path comprehensive and handle CPU preemptions
The keep_running path relies on the implicit last task enqueue which makes
the statistics a bit difficult to track. Let's make the enqueue path
comprehensive:

- Set SCX_OPS_ENQ_LAST and handle the last runnable task enqueue explicitly.

- Implement layered_cpu_release() to re-enqueue tasks from a CPU preempted
  by a higher pri sched class and handle the re-enqueued tasks explicitly in
  layered_enqueue().

- Add more statistics to track all enqueue operations.
2024-06-07 11:49:38 -10:00
Tejun Heo
4a0993ceab scx_layered: Allow long-running tasks to keep running on the same CPU
When a task exhausts its slice, layered currently doesn't make any effort to
keep it on the same CPU. It dispatches the next task to run and then
enqueues the running one. This leads to suboptimal behaviors. e.g. When this
happens to a task in a preempting layer, the task will most likely find an
idle CPU or a task to preempt and then migrate there causing a completely
unnecessary migration.

This patch layered_dispatch() test whether the current task should keep
running on the CPU and then skip dispatching to keep the task running. This
behavior depends on the implicit local DSQ enqueue mechanism which triggers
when there are no other tasks to run.
2024-06-07 11:49:38 -10:00
Tejun Heo
200af60f2a scx_layered: Fix load failure due to scheduler_tick() -> sched_tick() rename
- scx_utils: Replace kfunc_exists() with ksym_exists() which doesn't care
  about the type of the symbol.

- scx_layered: Fix load failure on kernels >= v6.10-rc due to
  scheduler_tick() -> sched_tick rename. Attach the tick fentry function to
  either scheduler_tick() or sched_tick().
2024-06-06 12:54:59 -10:00
Tejun Heo
e556dd375d scx: Unify loading and running boilerplate across rust schedulers
Make restart handling with user_exit_info simpler and consistently use the
load and report macros consistently across the rust schedulers. This makes
all schedulers automatically handle auto restarts from CPU hotplug events.
Note that this is necessary even for scx_lavd which has CPU hotplug
operations as CPU hotplug operations which took place between skel open and
scheduler init can still trigger restart.
2024-06-03 12:25:41 -10:00
Tejun Heo
a2d5310cb6 Bump versions for a release 2024-06-03 08:35:21 -10:00