If a task is performance-critical, pick_idle_cpu() checks if the
previous core is a big core or not. If not, don't try to run on previous
core since a performance-critical task is better to run on a big core.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
A single threshold for a low watermark does not work well across systems
with various numbers of cores and core types. Instead of using a single
low watermark value, we dynamically decide the low watermark: 1) until
one little core is fully utilized or 2) until two big cores are fully
utilized. This works better across systems.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When the no_freq_scaling changes during runtime in the autopilot mode,
the last target freq set would not be 1024. So the performance mode
enabled by the autopilot mode would not run in the best profile. Hence,
we set the target freq to 1024 always when no_freq_scaling is set.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Add "--autopilot" option and mode. In the autopilot mode, the scheduler
dynamically changes its power mode according to system's load (cpu
utilization). When the cpu utilization is low enough (say <=5%), it
switches to the powersave mode since there is nothing to process fast so
powersaving is the primary goal. When the utilization is moderate (say
>5%, <=30%), it runs in balanced mode. When the utilization is high
enough (say >30%), it runs in performance mode.
Note that it only changes scheduler's power mode but it does not change
system's energy profile.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When a cpu is idle for a whole interval, its idle time does not
correctlyh adds up so the utilization of such cpu tends to be higher
than the actual utilization. Now it is fixedk, so cpu utilization
becomes more accurate.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When the power mode changes back to performance mode, we should
active/overflow cpumask to its initial state -- all big cores are in
active cpumask and all little cores are in overflow cpumask. Otherwise,
the active/overflow cpumasks will be used in the perfformance mode.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
If a task can be run only on a single cpu, we don't need to go through
all the steps in ops.select_cpu(). Instread, we simply check if a task
is still pinned on the prev_cpu and go.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
It checkes the EPP (energy performance preference) peirodically and sets
the power profile of the scheduler during runtiime as a user changes its
EPP profile (from her desktop UI).
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When iterating neighbors, the existing code unnecessarily iterates all
the neighbors to the maximum even if there is no neighors. So the fix
escapes early when there is no neighbors.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Ctrl-c wasn't properly handled in the monitoring mode
(`--monitor-sched-samples`), so the scheduler could not be terminated by
pressing ctrl-c. The missing ctrl-c handling is added to the monitor
thread.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
With an ill combination of old kernel and old LLVM, the BPF verifier
incorrectly detects an infinite loop. After changing the loop with a
constant end, the old verifier can pass the code.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Now it checks an active cpumask within a previous core's compute domain
before checking the full active CPUs.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The BPF verifier in the old kernel gives up to analysis the nested loop
in the consume_task(). We reduce the loop less complex by reducing
LAVD_CPDOM_MAX_DIST from 6 to 4 in order to make the verifier happy.
Note that the theoretical maximum distance is 6 (numa > llc > core type)
but there is no such hardware today, hence reducing it to 6 should be
okay in next few years, when hopefully the verifier becomes smarter.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Updating nr_queue_task every runqueue operation is expensive and
unnecessary. So we do update every system state update interval and use
moving average, which is accurate enough.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
meson build script was building each rust sub-project under rust/ and
scheds/rust/ separately. This means that each rust project is built
independently which leads to a couple problems - 1. There are a lot of
shared dependencies but they have to be built over and over again for each
proejct. 2. Concurrency management becomes sad - we either have to unleash
multiple cargo builds at the same time possibly thrashing the system or
build one by one.
We've been trying to solve this from meson side in vain. Thankfully, in
issue #546, @vimproved suggested using cargo workspace which makes the
sub-projects share the same target directory and built together by the same
cargo instance while still allowing each project to behave independently for
development and publishing purposes.
Make the following changes:
- Create two cargo workspaces - one under rust/, the other under
scheds/rust/. Each contains all rust projects underneath it.
- Don't let meson descend into rust/. These are libraries used by the rust
schedulers. No need to build them from meson. Cargo will build them as
needed.
- Change the rust_scheds build target to invoke `cargo build` in
scheds/rust/ and let cargo do its thing.
- Remove per-scheduler meson.build files and instead generate custom_targets
in scheds/rust/meson.build which invokes `cargo build -p $SCHED`.
- This changes rust binary directory. Update README and
meson-scripts/install_rust_user_scheds accordingly.
- Remove per-scheduler Cargo.lock as scheds/rust/Cargo.lock is shared by all
schedulers now.
- Unify .gitignore handling.
The followings are build times on Ryzen 3975W:
Before:
________________________________________________________
Executed in 165.93 secs fish external
usr time 40.55 mins 2.71 millis 40.55 mins
sys time 3.34 mins 36.40 millis 3.34 mins
After:
________________________________________________________
Executed in 36.04 secs fish external
usr time 336.42 secs 0.00 millis 336.42 secs
sys time 36.65 secs 43.95 millis 36.61 secs
Wallclock time is reduced 5x and CPU time 7x.
Let's make it a bit easier to use:
- Shorten exported names by changing the prefix from ScxStats to Stats. This
should be distinctive enough and more inline with how most libraries name
their exports.
- Importing the right set of traits can be tricky. Introduce prelude module
so that importing is a bit less painful.
Scheduling sample reporting is switched to use scx_stats. This makes the
scheduler run without making too much noise while still allowing monitoring
on demand. It can also make introspection more dynamic - e.g. it shouldn't
be difficult to add other monitoring commands which take scheduling samples
based on different criteria or add other types of staisitcs.
--nr_sched-samples is replaced with --monitor-nr-samples.
A lot of scx_lavd's options do not clearly explain what they do. Add
some short explanations, clean up the existing ones, and direct the user
to read the in-code documentation for more info.