When the power mode changes back to performance mode, we should
active/overflow cpumask to its initial state -- all big cores are in
active cpumask and all little cores are in overflow cpumask. Otherwise,
the active/overflow cpumasks will be used in the perfformance mode.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
If a task can be run only on a single cpu, we don't need to go through
all the steps in ops.select_cpu(). Instread, we simply check if a task
is still pinned on the prev_cpu and go.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
It checkes the EPP (energy performance preference) peirodically and sets
the power profile of the scheduler during runtiime as a user changes its
EPP profile (from her desktop UI).
Signed-off-by: Changwoo Min <changwoo@igalia.com>
When iterating neighbors, the existing code unnecessarily iterates all
the neighbors to the maximum even if there is no neighors. So the fix
escapes early when there is no neighbors.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Ctrl-c wasn't properly handled in the monitoring mode
(`--monitor-sched-samples`), so the scheduler could not be terminated by
pressing ctrl-c. The missing ctrl-c handling is added to the monitor
thread.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
With an ill combination of old kernel and old LLVM, the BPF verifier
incorrectly detects an infinite loop. After changing the loop with a
constant end, the old verifier can pass the code.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Now it checks an active cpumask within a previous core's compute domain
before checking the full active CPUs.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The BPF verifier in the old kernel gives up to analysis the nested loop
in the consume_task(). We reduce the loop less complex by reducing
LAVD_CPDOM_MAX_DIST from 6 to 4 in order to make the verifier happy.
Note that the theoretical maximum distance is 6 (numa > llc > core type)
but there is no such hardware today, hence reducing it to 6 should be
okay in next few years, when hopefully the verifier becomes smarter.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Updating nr_queue_task every runqueue operation is expensive and
unnecessary. So we do update every system state update interval and use
moving average, which is accurate enough.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
meson build script was building each rust sub-project under rust/ and
scheds/rust/ separately. This means that each rust project is built
independently which leads to a couple problems - 1. There are a lot of
shared dependencies but they have to be built over and over again for each
proejct. 2. Concurrency management becomes sad - we either have to unleash
multiple cargo builds at the same time possibly thrashing the system or
build one by one.
We've been trying to solve this from meson side in vain. Thankfully, in
issue #546, @vimproved suggested using cargo workspace which makes the
sub-projects share the same target directory and built together by the same
cargo instance while still allowing each project to behave independently for
development and publishing purposes.
Make the following changes:
- Create two cargo workspaces - one under rust/, the other under
scheds/rust/. Each contains all rust projects underneath it.
- Don't let meson descend into rust/. These are libraries used by the rust
schedulers. No need to build them from meson. Cargo will build them as
needed.
- Change the rust_scheds build target to invoke `cargo build` in
scheds/rust/ and let cargo do its thing.
- Remove per-scheduler meson.build files and instead generate custom_targets
in scheds/rust/meson.build which invokes `cargo build -p $SCHED`.
- This changes rust binary directory. Update README and
meson-scripts/install_rust_user_scheds accordingly.
- Remove per-scheduler Cargo.lock as scheds/rust/Cargo.lock is shared by all
schedulers now.
- Unify .gitignore handling.
The followings are build times on Ryzen 3975W:
Before:
________________________________________________________
Executed in 165.93 secs fish external
usr time 40.55 mins 2.71 millis 40.55 mins
sys time 3.34 mins 36.40 millis 3.34 mins
After:
________________________________________________________
Executed in 36.04 secs fish external
usr time 336.42 secs 0.00 millis 336.42 secs
sys time 36.65 secs 43.95 millis 36.61 secs
Wallclock time is reduced 5x and CPU time 7x.
Let's make it a bit easier to use:
- Shorten exported names by changing the prefix from ScxStats to Stats. This
should be distinctive enough and more inline with how most libraries name
their exports.
- Importing the right set of traits can be tricky. Introduce prelude module
so that importing is a bit less painful.
Scheduling sample reporting is switched to use scx_stats. This makes the
scheduler run without making too much noise while still allowing monitoring
on demand. It can also make introspection more dynamic - e.g. it shouldn't
be difficult to add other monitoring commands which take scheduling samples
based on different criteria or add other types of staisitcs.
--nr_sched-samples is replaced with --monitor-nr-samples.
A lot of scx_lavd's options do not clearly explain what they do. Add
some short explanations, clean up the existing ones, and direct the user
to read the in-code documentation for more info.
This option chooses little (effiency) cores over big (performance) cores
to save power consumption for core compaction.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The changes include 1) chopping down a big function into smaller ones
for readability and maintainability and 2) using the interior mutability
pattern (Cell and RefCell) to avoid unnecessary clone() calls. There
are no functional changes.
Signed-off-by: Changwoo Min <changwoo@igalia.com>