Commit Graph

241 Commits

Author SHA1 Message Date
Andrea Righi
cb16a11342 scx_rustland_core: get rid of the global scheduler's slice_us
Since scx_rustland_core enables setting a time slice on a per-task basis
during task dispatch, there's no need to maintain a global time slice in
the BPF component. Instead, a global time slice can simply be managed in
user-space, achieving the same outcome.

Therefore, drop the global slice_us property from BpfScheduler to
simplify the API.

NOTE: if a time slice is not specified for a task, SCX_SLICE_DFL will be
used by default.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 09:45:18 +02:00
Andrea Righi
0aa23481de scx_rustland_core: drop update_tasks() and introduce notify_complete()
The update_tasks() API is somewhat confusing, so replace it with a
clearer API, notify_complete().

This new API will return control to the BPF component and inform it
about the number of tasks still pending in the user-space scheduler.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-25 00:45:23 +02:00
Andrea Righi
cef8ff8757 scx_rustland_core: get rid of the low_power API
The low-power API is a bit of a hack implemented purely in the BPF
layer, this should be better re-implemented with some concepts of
topology awareness.

Therefore, get rid of this API for now.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
568e292a24 scx_rustland_core: get rid of the exiting task API
The current API used to notify the user-space scheduler when a task
exits is really confusing (setting a negative value in
queued_task_ctx.cpu), and it's also possible to detect task exiting
events from user-space (or check in procfs, even if it's slower).

In any case, a better API should be provided for this, so drop the
current one for now.

NOTE: this will cause additional memory usage for scx_rustland, but it
can be fixed/addressed later in a separate commit (i.e., providing a
periodic garbage collector for the unused task entries).

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
eec395f16a scx_rustland_core: better time slice control
Instead of determining the task time slice in ops.enqueue(), refresh the
time slice immediately before the task is started on its assigned CPU in
ops.running().

This ensures to apply the exact time slice specified by the user-space
scheduler and the sched_ext core will never implicitly dispatch tasks
using SCX_SLICE_DFL.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
c0a2cfb481 scx_rustland_core: always schedule per-CPU kthreads to user-space
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:29:10 +02:00
Andrea Righi
5d544ea264 scx_rustland_core: move CPU idle selection logic in user-space
Allow user-space scheduler to pick an idle CPU via
self.bpf.select_cpu(pid, prev_task, flags), mimicking the BPF's
select_cpu() iterface.

Also remove the full_user option and always rely on the idle selection
logic from user-space.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-24 21:28:13 +02:00
Andrea Righi
e72676ede3
Merge pull request #540 from sched-ext/bpfland-cpufreq-awareness
scx_bpfland: cpu frequency and energy awareness
2024-08-23 21:17:34 +02:00
Tejun Heo
8c8912ccea Merge branch 'main' into htejun/scx_rusty 2024-08-23 07:50:23 -10:00
Andrea Righi
bb7248ce61 scx_utils::cpumask: introduce is_empty() and is_full()
Introduce new methods to CpuMask to check if no bits are set or if all
bits are set.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-23 19:48:53 +02:00
Tejun Heo
44a0f1b124 scx_utils: Factor out monitor_stats() from scx_rusty and scx_layered 2024-08-23 06:46:19 -10:00
Tejun Heo
70ba4cb9ef scx_stats: Fix multiple labels handling in scxstats_to_openmetrics.py
OM labels() was called with an array which is then incorrectly interpreted
as a single label. Unpack it to list of arguments. While at it, make error
reporting a bit more robust.
2024-08-23 04:56:30 -10:00
Tejun Heo
4678476ca7 scx_stats_derive: Each _AssScxStastMeta assertion should have an unique ID 2024-08-22 13:53:27 -10:00
Tejun Heo
13fa48a871 scx_rusty: Separate out stats generation and formatting
to prepare for scx_stats conversion.
2024-08-22 10:03:10 -10:00
Tejun Heo
4d1f0639d8 Version: v1.0.3 2024-08-21 06:42:11 -10:00
Tejun Heo
6a2faf2e17
Merge pull request #530 from sched-ext/htejun/misc
scx_utils::topology: Use lazy_static instead of LazyLock
2024-08-21 06:04:14 -10:00
Andrea Righi
9f7a11bba6
Merge pull request #528 from sched-ext/bpfland-turbo-boost
scx_bpfland: properly classify Intel Turbo Boost CPUs
2024-08-21 17:40:25 +02:00
Tejun Heo
081b999361 scx_utils::topology: Use lazy_static instead of LazyLock
LazyLock is stable but has become so only very recently and can trigger
build errors on not-too-old stable rustc's which are still in wide use.
Let's use lazy_static instead for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-21 05:34:39 -10:00
Andrea Righi
bbe388e3bc scx_utils: topology: add base_freq() method to Cpu
With Intel Turbo Boost enabled, some CPUs might show a higher maximum
frequency than others, even if they are not actually faster cores. This
can potentially confuse some auto-detection logic for distinguishing
between fast and slow cores in certain schedulers.

The base CPU frequency reported in
/sys/devices/system/cpu/cpuN/cpufreq/base_frequency represents a more
reliable indicator for identifying truly fast and slow cores.

To address this, provide a new base_freq() method in the struct Cpu,
which will return the base operational frequency of a CPU when Turbo
Boost is present. If Turbo Boost is not available, base_freq() will
return the maximum frequency, functioning the same as max_freq().

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 10:07:57 +02:00
Tejun Heo
1da249f063 scx_utils::topology: Always use NR_CPU_IDS and NR_CPUS_POSSIBLE
Always use the LazyLock versions and drop the counterparts from Topology.
2024-08-20 21:57:56 -10:00
Tejun Heo
1ae4655b3c scx_utils::cpumask: Default to displaying in hex
There isn't much to gain by displaying cpumasks in binary. Drop separate
Display implementation just default to 'x' formatting.
2024-08-20 21:50:23 -10:00
Tejun Heo
3ca2f0b6f9 scx_utils/cpumask: Use nr_cpu_ids instead of num_possible_cpus
- Add static NR_CPU_IDS and NR_CPUS_POSSIBLE to topology.

- Fix comment for Topology::nr_cpu_ids(). Was missing a negation.

- cpumaks should be sized by nr_cpus_ids, not num_possible_cpus and the
  number can't change while the system is running. Drop cpumask.nr_cpus and
  use *NR_CPU_IDS everywhere.
2024-08-20 21:25:40 -10:00
Tejun Heo
0cc59a5243 scx_utils: cargo fmt 2024-08-20 21:25:40 -10:00
Tejun Heo
f7c193e528 scx_utils, scx_rusty: Minor updates to version handling
- Update scx_utils/build.rs so that 12 char SHA1 is generated instead of
  full one.

- Add --version to scx_rusty. Use custom one as we don't want to use the
  default cargo version one.
2024-08-20 21:03:05 -10:00
Andrea Righi
235f19fdf1 cpumask: implement hex string formatter
Allow to format a Cpumask as an hex string, implementing the proper
formatter LowerHex / UpperHex traits.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-21 07:21:22 +02:00
Andrea Righi
33b6ada98e
Merge pull request #509 from sched-ext/bpfland-topology
scx_bpfland: topology awareness
2024-08-20 14:37:23 +02:00
Andrea Righi
0b2dc6b9fc scx_utils: Add L2 / L3 cache id to CPU
Add the L2 / L3 cache id to the Cpu struct, to quickly determine the
cache nodes associated to each CPU.

Signed-off-by: Andrea Righi <andrea.righi@linux.dev>
2024-08-20 10:16:02 +02:00
Tejun Heo
4e859d067e scx_stats/scripts/scxstats_to_openmetrics: Retry connection
It now retries until told to exit. This is a bit easier to use and matches
`scx_layered --monitor`.
2024-08-19 12:52:57 -10:00
Tejun Heo
6cba8d786a scx_stats: server: open_ops must be kept throughout a client session
open_ops tracks which ops have been opened by the client session; however,
it was being created on each handle_request() making every request to open
each time. Fix it by moving it to the caller.
2024-08-19 08:38:13 -10:00
Tejun Heo
17a460c179 scx_stats: ScxStatsOps fields must be public 2024-08-19 07:51:05 -10:00
Tejun Heo
27c530e17e scx_stats: Add missing trait exports 2024-08-19 07:16:43 -10:00
Tejun Heo
1e89184ba7 scx_stats: server: s/Tx/Req/ and s/Rx/Res/ for clarity 2024-08-19 07:11:26 -10:00
Tejun Heo
4d88c9aec7 scx_stats: Add channel arguments to open and close ops too 2024-08-19 06:56:14 -10:00
Tejun Heo
a77fe372d6 scx_stats: Make server shutdown when connection is dropped and add communication channel
This will make implementing connection sessions easier where each stats
client connection maintains a set of states.
2024-08-19 06:23:16 -10:00
Tejun Heo
689d380db1 scx_stats: Implement ScxStatsServer::add_stats_ops()
This allows stats reader to maintain persistent states per connection.
2024-08-16 13:24:17 -10:00
Tejun Heo
da2e014d15 scx_stats: Add more documentation on user attributes in README 2024-08-16 09:11:23 -10:00
Tejun Heo
3a688cfde7 scx_stats: Add support for no-value user attributes and a bunch of other changes
- Allow no-value user attributes which are automatically assigned "true"
  when specified.

- Make "top" attribute string "true" instead of bool true for consistency.
  Testing for existence is always enough for value-less attributes.

- Don't drop leading "_" from user attribute names when storing in dicts.
  Dropping makes things more confusing.

- Add "_om_skip" to scx_layered fields which don't jive well with OM.
  scxstats_to_openmetrics.py is updated accordignly and no longer generates
  warnings on those fields.

- Examples and README updated accordingly.
2024-08-16 07:52:02 -10:00
Tejun Heo
870a262713 scx_stats: Add scripts/scxstats_to_openmetrics.py
This is a generic tool to pipe from scx_stats to OpenMetrics. This is a
barebone implmentation and the current output may not match what scx_layered
was outputting before. Will be updated later.
2024-08-15 22:51:22 -10:00
Tejun Heo
ea453e51d3 scx_stats: Rename "all" attribute to "top" and clean up examples a bit 2024-08-15 21:24:55 -10:00
Tejun Heo
6a5d6f7c27 scx_stats: Replace field_prefix attribute with '_' prefixed user attributes 2024-08-15 19:09:59 -10:00
Tejun Heo
f7c5a598bc scx_stats: Store ScxStatsMeta in BTreeMap instead of Vec
This makes the metadata easier to use.
2024-08-15 18:32:53 -10:00
Tejun Heo
834ce62b95 scx_stats: Fields ScxStatsMeta should be a BTreeMap not vec
Also simplify trait bound assertion.
2024-08-15 18:21:19 -10:00
Tejun Heo
f345e83c05 scx_stats: Replace om_prefix field attribute with field_prefix struct attribute
And strongly distinguish between field and struct attributes while parsing.
2024-08-15 18:05:24 -10:00
Tejun Heo
a9922deaa2 scx_stats: Add "all" attribute and rename metadata type strings 2024-08-15 14:50:00 -10:00
Tejun Heo
ebc1a89c34 scx_stats: s/stat/stats/ stragglers 2024-08-15 14:00:00 -10:00
Tejun Heo
bafd67b568 scx_stats: Fix parsing for multiple stat attributes
The code was assuming single attribute per #[stat()] block. Update it so
that there can be multiple comma separated attributes in a single block.
2024-08-15 13:46:20 -10:00
Tejun Heo
db0ddbe249 scx_utils: Add support for om_prefix stat attribute
This will be used to build generic bridge to openmetrics.
2024-08-15 13:08:44 -10:00
Tejun Heo
ba2e0be899 scx_stats: Add .gitignore 2024-08-15 12:31:04 -10:00
Tejun Heo
0b9c8b5cbd scx_stats: Update versions to 0.2.0 to republish 2024-08-15 12:29:27 -10:00
Tejun Heo
d6ef2c2a1f scx_stats: Synchronize crate dependency versions 2024-08-15 12:28:29 -10:00