Commit Graph

1167 Commits

Author SHA1 Message Date
David Vernet
9919b71fd4
Merge pull request #379 from sched-ext/topo_nr_cpu_ids
Add topo.nr_cpu_ids() to Topology crate
2024-06-21 13:35:05 -05:00
David Vernet
772ac03311
Merge pull request #375 from sched-ext/htejun/revert-flatcg-refcnt
Revert "scx_flatcg: Keep cgroup rb nodes stashed"
2024-06-21 13:06:02 -05:00
David Vernet
3bd15be840
rlfifo: Use topo.nr_cpu_ids() instead of topo.nr_cpus_possible()
In scx_rlfifo, we're currently using topo.nr_cpus_possible() to
determine how many possible CPU IDs we could have on the system. To
properly support systems whose disabled CPUs may be in the middle of the
range of possible CPU IDs, let's instead use topo.nr_cpu_ids() so that
we don't accidentally dispatch to an invalid DSQ.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-21 12:57:20 -05:00
David Vernet
263e02f644
rusty: Use nr_cpu_ids instead of nr_cpus_possible
In scx_rusty, we're currently using topo.nr_cpus_possible() to determine
how many possible CPU IDs we could have on the system. scx_rusty already
accounts for offlined CPUs, so to properly support systems whose
disabled CPUs may be in the middle of the range of possible CPU IDs,
let's instead use topo.nr_cpu_ids().

Signed-off-by: David Vernet <void@manifault.com>
2024-06-21 12:57:19 -05:00
David Vernet
bdbf4b9c05
topo: Return nr_cpu_ids from host Topology
In some cases, a host may have an odd topology where there are gaps in
CPU IDs (including between possible CPUs). A common pattern in
schedulers is to perform allocations for every possible CPU ID, such as
creating a per-cpu DSQ. In order to avoid confusing schedulers, let's
track the maximum CPU ID on a system so that we can return the number of
CPU IDs on the system which is inclusive of gaps.

We also update scx_rustland in this change to accommodate the fact that
we no longer export nr_cpus_possible() from TopologyMap.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-21 12:57:13 -05:00
David Vernet
68116302d8
Merge pull request #378 from sirlucjan/hooks-update
Simplifying pacman-hooks
2024-06-21 12:30:12 -05:00
David Vernet
3219d15e3d
Merge pull request #292 from hodgesds/stress-ng-ci
Add stress-ng to scheduler tests
2024-06-21 11:35:56 -05:00
Jose Fernandez
83373b1f4e
rusty: Integrate stats with the metrics framework
We need a layer of indirection between the stats collection and their
output destinations. Currently, stats are only printed to stdout. Our
goal is to integrate with various telemetry systems such as Prometheus,
StatsD, and custom metric backends like those used by Meta and Netflix.
Importantly, adding a new backend should not require changes to the
existing stats code.

This patch introduces the `metrics` [1] crate, which provides a
framework for defining metrics and publishing them to different
backends.

The initial implementation includes the `dispatched_tasks_count`
metric, tagged with `type`. This metric increments every time a task is
dispatched, emitting the raw count instead of a percentage. A monotonic
counter is the most suitable metric type for this use case, as
percentages can be calculated at query time if needed. Existing logged
metrics continue to print percentages and remain unchanged.

A new flag, `--enable-prometheus`, has been added. When enabled, it
starts a Prometheus endpoint on port 9000 (default is false). This
endpoint allows metrics to be charted in Prometheus or Grafana
dashboards.

Future changes will migrate additional stats to this framework and add
support for other backends.

[1] https://metrics.rs/

Signed-off-by: Jose Fernandez <josef@netflix.com>
2024-06-21 10:18:44 -06:00
Piotr Gorski
3684b1601c
Simplifying pacman-hooks
Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
2024-06-21 12:18:33 +02:00
Tejun Heo
7a40059b55 Revert "scx_flatcg: Keep cgroup rb nodes stashed"
This reverts commit 3b7f33ea1b.

I haven't root caused it yet but it's easy to reproduce stall and trigger
the watchdog after the commit - just running stress in multiple cgroups
easily triggers stalls after a couple tens of seconds. Let's revert it for
now.
2024-06-19 14:44:26 -10:00
Andrea Righi
92ca7f385c
Merge pull request #374 from sched-ext/rustland-alloc-refactoring
scx_rustland_core: include buddy-alloc and refactor allocator code
2024-06-19 19:15:32 +02:00
Andrea Righi
b04e82b5eb scx_rustland_core: include buddy-alloc and refactor allocator code
The dependency of the buddy-alloc crate [1] seems to cause some troubles
with packaging, mostly because the selftests for the crate are failing
when it's compiled in release mode.

For example:

 $ cargo test --release -- --nocapture
 thread 'tests::fast_alloc::test_basic_malloc' panicked at src/tests/fast_alloc.rs:25:13:
 assertion `left == right` failed
   left: 0
  right: 42

Some of these failures with BuddyAlloc can be fixed by using a memory
arena buffer aligned to page size.

However, some test failures with FastAlloc persist that cannot be
resolved merely by aligning the pre-allocated memory arena to the page
size, as mentioned in [2].

The concern is that this may potentially lead to actual memory bugs.

Therefore, it seems safer to refactor the custom allocator code to
simply use BuddyAlloc, dropping FastAlloc completely.

To achieve this, the entire BuddyAlloc code has been directly included
in scx_rustland_core, referencing the original project and its MIT
licensing information (with the entire code still distributed under the
GPLv2 license).

Then the code has been slightly modified to remove FastAlloc and the
external dependency on the buddy-alloc crate has been dropped.

From a performance perspective this change doesn't seem to introduce any
measurable regression.

[1] https://github.com/jjyr/buddy-alloc
[2] https://github.com/jjyr/buddy-alloc/issues/16

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-19 14:44:04 +02:00
Changwoo Min
9c21ace276
Merge pull request #373 from vax-r/lavd_reuse
scx_lavd: Reuse can_task1_kick_task2
2024-06-19 15:29:05 +09:00
David Vernet
b1b43fdbd8
Merge pull request #372 from vax-r/util_entry
scx_utils: Utilize Entry API for BTreeMap insertion
2024-06-18 22:40:44 -05:00
I Hsin Cheng
99960ad960 scx_lavd: Reuse can_task1_kick_task2
Use the function can_task1_kick_task2() to replace places which also
checking the comp_preemption_info between two cpus for better
consistency.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-19 11:01:31 +08:00
I Hsin Cheng
1334a4df5d scx_utils: Utilize Entry API for BTreeMap insertion
Take advantages of BTreeMap's Entry API working with or_insert() to do
the conditional insertion. Insert only when the entry doesn't exist.
Doing so can reduce the amount of code and provide better readability
and perform in-place manipulation.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-19 10:27:10 +08:00
Changwoo Min
691869e83f
Merge pull request #369 from sched-ext/lavd-fix-pick-cpu
scx_lavd: properly check for idle CPUs in pick_cpu()
2024-06-19 09:23:17 +09:00
Changwoo Min
dad25f1b5d
Merge pull request #368 from multics69/lavd-perf-misc
scx_lavd: misc performance tuning and code clean up
2024-06-19 07:26:52 +09:00
David Vernet
2c8cb23186
Merge pull request #362 from vax-r/Refactor
scx_rusty: Refactor lookup operation for new_domc in task_set_domain
2024-06-18 13:48:14 -05:00
Andrea Righi
bad9ed13ef scx_lavd: properly check for idle CPUs in pick_cpu()
It seems that we are not updating `is_idle` when we find an idle CPU
with pick_cpu(), causing unnecessary rescheduling events when
select_cpu() is called.

To resolve this, ensure that the is_idle state is correctly set.
Additionally, always ensure that the task is dispatched to the local DSQ
immediately upon finding (and reserving) an idle CPU.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-18 17:36:39 +02:00
Changwoo Min
632fa9e4f2 scx_lavd: misc code clean up
- clean up u63 and u32 usages in structures to reduce struct size
- refactoring pick_cpu() for readability

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-18 18:11:49 +09:00
Changwoo Min
5165bf5a03 scx_lavd: tuning CPU frequency scaling
The required CPU performance (cpuperf) was set to 1024 (100%) when the
CPU utilization was 100%. When a sudden load spike happens, it makes the
system adapt slowly in the next interval.

The new scheme always reserves some headroom in advance, so it sets
cpuperf to 1024 when the CPU utilization reaches to 85%. This gives some
room to adapt in advance.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
2024-06-18 18:11:49 +09:00
I Hsin Cheng
94e3616c02 scx_rusty: Refactor lookup operation for new_domc in task_set_domain
Modify the execution sequence before lookup operation for new_domc. If
new_dom_id == NO_DOM_FOUND, lookup operation for new_domc is definitely
going to fail so we don't have to wait until we found that new_domc is
NULL, clearing of cpumask and return operation should be done directly
in that case.

Plus we should avoid using try_lookup_dom_ctx outside the context of
lookup_dom_ctx, as it can keep the interface's consistency.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-18 12:58:17 +08:00
Tejun Heo
819ffd527f
Merge pull request #367 from sched-ext/htejun/dsq-iter-fix
scx/compat.bpf.h: Fix __COMPAT_scx_bpf_consume_task() and improve scx_qmap example
2024-06-17 10:29:38 -10:00
Tejun Heo
1012e3a6db scx/compat.bpf.h: Fix __COMPAT_scx_bpf_consume_task() and improve scx_qmap example
__COMPAT_scx_bpf_consume_task() wasn't calling scx_bpf_consume_task() at all
and was always returning false. Fix it.

Also, update scx_qmap usage example so that it matches cgroup ID rather than
comm prefix. This should make testing with multiple processes a bit easier.
2024-06-17 10:11:06 -10:00
David Vernet
0184444285
Merge pull request #366 from sched-ext/task_set_domain_global
rusty: Make dom_xfer_task() a global prog
2024-06-17 14:43:45 -05:00
David Vernet
dfe0ffb312
Merge pull request #347 from sched-ext/rusty_cleanup
rusty: Clean up some logic in rusty
2024-06-17 14:26:53 -05:00
David Vernet
7985ee556e
rusty: Clean up dispatch logic
The rusty dispatch logic is a bit unnecessarily convoluted. Let's clean it up
so that we're just comparing dom ids rather than iterating over arrays nested
inside of pcpu context.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-17 14:24:30 -05:00
David Vernet
87aa86845d
rusty: Refactor + slightly improve wake_sync
Right now, the SCX_WAKE_SYNC logic in rusty is very primitive. We only check to
see if the waker CPU's runqueue is empty, and then migrate the wakee there if
so. We'll want to expand this to be more thorough, such as:

- Checking to see if prev_cpu and waker_cpu share the same LLC when determining
  where to migrate
- Check for whether SCX_WAKE_SYNC migration helps load imbalance between cores
- ...

Right now all of that code is just a big blob in the middle of
rusty_select_cpu(). Let's pull it into its own function to improve readability,
and also add some logic to stay on prev_cpu if it shares an LLC with the waker.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-17 14:24:29 -05:00
David Vernet
fed66fa571
rusty: Make dom_xfer_task() a global prog
It seems that task_set_domain() is nearly at the point where it can
cause the verifier to get confused and think that it's exceeding the
number of available instructions per program. I've seen this a number of
times when making small changes to task_set_domain(), and it's once
again happened @vax-r (I-Hsin Cheng) made a small cleanup change to
rusty in https://github.com/sched-ext/scx/pull/362.

To avoid this, let's just make dom_xfer_task() a separate global program
so that the verifier doens't have to worry about branch pruning, etc
depending on what the caller does. This should hopefully make
task_set_domain() (and its callers) much less brittle.

Signed-off-by: David Vernet <void@manifault.com>
2024-06-17 14:22:26 -05:00
Tejun Heo
53fe2b8737
Merge pull request #365 from vax-r/Typos
scx_utils: Fix typos
2024-06-17 08:37:24 -10:00
I Hsin Cheng
f13697a755 scx_utils: Fix typos
Correct "Thie" to "This".

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
2024-06-18 00:35:43 +08:00
Tejun Heo
d18bb4831a
Merge pull request #363 from sched-ext/htejun/compat-strip
Strip compat support
2024-06-16 20:04:50 -10:00
Tejun Heo
b6ebdc635a compat: Compact min requirement checks
Let's check only the latest one.
2024-06-16 06:53:58 -10:00
Tejun Heo
6319b25cf1 scx_utils: compat.rs: Follow-up clean-ups 2024-06-16 06:45:14 -10:00
Tejun Heo
aeb805a93e compat: Drop support for missing sched_ext_ops.dump*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.dump*(). The
open helper macros now check the existence of the fields and abort if
missing.
2024-06-16 06:43:43 -10:00
Tejun Heo
4cca1e9acf compat: Drop support for missing sched_ext_ops.tick()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.tick(). The
open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:40:28 -10:00
Tejun Heo
970c04b43a compat: Drop support for missing sched_ext_ops.exit_dump_len
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.exit_dump_len.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:37:34 -10:00
Tejun Heo
046bdfd5e0 compat: Drop support for missing sched_ext_ops.hotplug_seq
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop support for missing sched_ext_ops.hotplug_seq.
The open helper macros now check the existence of the field and abort if
missing.
2024-06-16 06:34:59 -10:00
Tejun Heo
dde2942125 compat: Drop __COMPAT_scx_bpf_cpuperf_*()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_cpuperf_*(). The open helper
macros now check the existence of scx_bpf_cpuperf_cap() and abort if not.
2024-06-16 06:16:53 -10:00
Tejun Heo
13e8388e1e compat: Drop __COMPAT_HAS_CPUMASKS
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_HAS_CPUMASKS(). The open helper macros
now check the existence of scx_bpf_nr_cpu_ids() and abort if not.
2024-06-16 06:12:06 -10:00
Tejun Heo
66901e2b44 compat: Drop __COMPAT_scx_bpf_dump()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_dump(). The open helper macros
now check the existence of scx_bpf_dump_bstr() and abort if not.

While at it, reorder the min requirement checks so that newly added ones are
up top to make testing easier.
2024-06-16 06:02:47 -10:00
Andrea Righi
38141905eb
Merge pull request #361 from sched-ext/rustland-fix-stall
scx_rustland_core: fix potential race in dispatch_task()
2024-06-16 17:40:17 +02:00
Tejun Heo
0d8adf2260 compat: Drop __COMPAT_scx_bpf_exit()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_exit(). The open helper macros
now check the existence of scx_bpf_exit_bstr() and abort if not.
2024-06-15 20:36:17 -10:00
Andrea Righi
812aeb3b81 scx_rustland_core: fix potential race in dispatch_task()
Fix a potential race condition that might lead to a task being
dispatched without kicking the target CPU, which could result in a
potential stall.

With this applied, scx_rustland has been running without any stall for
about 18 hours on a system where the issue was previously quite easy to
reproduce.

Moreover, clarify a couple of comments in the dispatch path.

This fixes issue #353.

Tested-by: SoulHarsh007 <harsh.peshwani@outlook.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
2024-06-16 08:31:48 +02:00
Tejun Heo
5b5e5be906 compat: Drop __COMPAT_SCX_KICK_IDLE
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_SCX_KICK_IDLE. The open helper macros
now check the existence of SCX_KICK_IDLE and abort if not.
2024-06-15 20:24:15 -10:00
Tejun Heo
b730f35e68 scx/common.h: Improve SCX_BUG() macro
There's no guarantee that errno is set or contains relevant information when
SCX_BUG() is invoked. This sometimes leads to "task failed successfully"
messages:

  # ./scx_simple
  ../scheds/c/scx_simple.c:72 [scx panic]: Success
  SCX_OPS_SWITCH_PARTIAL missing, kernel too old?

While not critical, it's not great. Let's update it so that errno is printed
in parentheses when non-zero and match the tag to the macro name so that
what's printed is the following:

  # ./scx_simple
  [SCX_BUG] ../scheds/c/scx_simple.c:72
  SCX_OPS_SWITCH_PARTIAL missing, kernel too old?
2024-06-15 20:17:32 -10:00
Tejun Heo
7c9aedaefe compat: Drop __COMPAT_scx_bpf_switch_all()
In preparation of upstreaming, let's set the min version requirement at the
released v6.9 kernels. Drop __COMPAT_scx_bpf_switch_call(). The open helper
macros now check the existence of SCX_OPS_SWITCH_PARTIAL and abort if not.
2024-06-15 20:03:37 -10:00
Tejun Heo
fb2c70de84 scx_utils: compat.rs: Helper macros shouldn't return the calling function
scx_ops_open!() and scx_ops_attach!() could return the calling function
after an error, which can be surprising. Forutnately, as all the current
callers are either unwrapping or returning on error, the surprising behavior
is currently not very noticeable.

Fix it by breaking out of the macro block on errors.
2024-06-15 18:27:39 -10:00
Tejun Heo
dd6255a601
Merge pull request #359 from sched-ext/htejun/cosmetic
common.bpf.h: Cosmetic changes
2024-06-15 06:42:00 -10:00