Prior to this patch, we only bump LSTAT_AFFN_BIOL when the target cpu
was idle, but in both cases it should be counted as AFFN_VIOL.
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Currently scx_layered outputs statistics periodically as info! logs. The
format of this is largely unstructured and mostly suitable for running
scx_layered interactively (e.g. observing its behavior on the command
line or via logs after the fact).
In order to run scx_layered at larger scale, it's desireable to have
statistics output in some format that is amenable to being ingested into
monitoring databases (e.g. Prometheseus). This allows collection of
stats across many machines.
This commit adds a command line flag (-o) that outputs statistics to
stdout in OpenMetrics format instead of the normal log mechanism.
OpenMetrics has a public format
specification (https://github.com/OpenObservability/OpenMetrics) and is
in use by many projects.
The library for producing OpenMetrics metrics is lightweight but does
induce some changes. Primarily, metrics need to be pre-registered (see
OpenMetricsStats::new()).
Without -o, the output looks as before, for example:
```
19:39:54 [INFO] CPUs: online/possible=52/52 nr_cores=26
19:39:54 [INFO] Layered Scheduler Attached
19:39:56 [INFO] tot= 9912 local=76.71 open_idle= 0.00 affn_viol= 2.63 tctx_err=0 proc=21ms
19:39:56 [INFO] busy= 1.3 util= 65.2 load= 263.4 fallback_cpu= 1
19:39:56 [INFO] batch : util/frac= 49.7/ 76.3 load/frac= 252.0: 95.7 tasks= 458
19:39:56 [INFO] tot= 2842 local=45.04 open_idle= 0.00 preempt= 0.00 affn_viol= 0.00
19:39:56 [INFO] cpus= 2 [ 0, 2] 04000001 00000000
19:39:56 [INFO] immediate: util/frac= 0.0/ 0.0 load/frac= 0.0: 0.0 tasks= 0
19:39:56 [INFO] tot= 0 local= 0.00 open_idle= 0.00 preempt= 0.00 affn_viol= 0.00
19:39:56 [INFO] cpus= 50 [ 0, 50] fbfffffe 000fffff
19:39:56 [INFO] normal : util/frac= 15.4/ 23.7 load/frac= 11.4: 4.3 tasks= 556
19:39:56 [INFO] tot= 7070 local=89.43 open_idle= 0.00 preempt= 0.00 affn_viol= 3.69
19:39:56 [INFO] cpus= 50 [ 0, 50] fbfffffe 000fffff
19:39:58 [INFO] tot= 7091 local=84.91 open_idle= 0.00 affn_viol= 2.64 tctx_err=0 proc=21ms
19:39:58 [INFO] busy= 0.6 util= 31.2 load= 107.1 fallback_cpu= 1
19:39:58 [INFO] batch : util/frac= 18.3/ 58.5 load/frac= 93.9: 87.7 tasks= 589
19:39:58 [INFO] tot= 2011 local=60.67 open_idle= 0.00 preempt= 0.00 affn_viol= 0.00
19:39:58 [INFO] cpus= 2 [ 2, 2] 04000001 00000000
19:39:58 [INFO] immediate: util/frac= 0.0/ 0.0 load/frac= 0.0: 0.0 tasks= 0
19:39:58 [INFO] tot= 0 local= 0.00 open_idle= 0.00 preempt= 0.00 affn_viol= 0.00
19:39:58 [INFO] cpus= 50 [ 50, 50] fbfffffe 000fffff
19:39:58 [INFO] normal : util/frac= 13.0/ 41.5 load/frac= 13.2: 12.3 tasks= 650
19:39:58 [INFO] tot= 5080 local=94.51 open_idle= 0.00 preempt= 0.00 affn_viol= 3.68
19:39:58 [INFO] cpus= 50 [ 50, 50] fbfffffe 000fffff
^C19:39:59 [INFO] EXIT: BPF scheduler unregistered
```
With -o passed, the output is in OpenMetrics format:
```
19:40:08 [INFO] CPUs: online/possible=52/52 nr_cores=26
19:40:08 [INFO] Layered Scheduler Attached
# HELP total Total scheduling events in the period.
# TYPE total gauge
total 8489
# HELP local % that got scheduled directly into an idle CPU.
# TYPE local gauge
local 86.45305689716104
# HELP open_idle % of open layer tasks scheduled into occupied idle CPUs.
# TYPE open_idle gauge
open_idle 0.0
# HELP affn_viol % which violated configured policies due to CPU affinity restrictions.
# TYPE affn_viol gauge
affn_viol 2.332430203793144
# HELP tctx_err Failures to free task contexts.
# TYPE tctx_err gauge
tctx_err 0
# HELP proc_ms CPU time this binary has consumed during the period.
# TYPE proc_ms gauge
proc_ms 20
# HELP busy CPU busy % (100% means all CPUs were fully occupied).
# TYPE busy gauge
busy 0.5294061026085283
# HELP util CPU utilization % (100% means one CPU was fully occupied).
# TYPE util gauge
util 27.37195512782239
# HELP load Sum of weight * duty_cycle for all tasks.
# TYPE load gauge
load 81.55024768702126
# HELP layer_util CPU utilization of the layer (100% means one CPU was fully occupied).
# TYPE layer_util gauge
layer_util{layer_name="immediate"} 0.0
layer_util{layer_name="normal"} 19.340849995024997
layer_util{layer_name="batch"} 8.031105132797393
# HELP layer_util_frac Fraction of total CPU utilization consumed by the layer.
# TYPE layer_util_frac gauge
layer_util_frac{layer_name="batch"} 29.34063385422595
layer_util_frac{layer_name="immediate"} 0.0
layer_util_frac{layer_name="normal"} 70.65936614577405
# HELP layer_load Sum of weight * duty_cycle for tasks in the layer.
# TYPE layer_load gauge
layer_load{layer_name="immediate"} 0.0
layer_load{layer_name="normal"} 11.14363313258934
layer_load{layer_name="batch"} 70.40661455443191
# HELP layer_load_frac Fraction of total load consumed by the layer.
# TYPE layer_load_frac gauge
layer_load_frac{layer_name="normal"} 13.664744680306903
layer_load_frac{layer_name="immediate"} 0.0
layer_load_frac{layer_name="batch"} 86.33525531969309
# HELP layer_tasks Number of tasks in the layer.
# TYPE layer_tasks gauge
layer_tasks{layer_name="immediate"} 0
layer_tasks{layer_name="normal"} 490
layer_tasks{layer_name="batch"} 343
# HELP layer_total Number of scheduling events in the layer.
# TYPE layer_total gauge
layer_total{layer_name="normal"} 6711
layer_total{layer_name="batch"} 1778
layer_total{layer_name="immediate"} 0
# HELP layer_local % of scheduling events directly into an idle CPU.
# TYPE layer_local gauge
layer_local{layer_name="batch"} 69.79752530933632
layer_local{layer_name="immediate"} 0.0
layer_local{layer_name="normal"} 90.86574281031143
# HELP layer_open_idle % of scheduling events into idle CPUs occupied by other layers.
# TYPE layer_open_idle gauge
layer_open_idle{layer_name="immediate"} 0.0
layer_open_idle{layer_name="batch"} 0.0
layer_open_idle{layer_name="normal"} 0.0
# HELP layer_preempt % of scheduling events that preempted other tasks. #
# TYPE layer_preempt gauge
layer_preempt{layer_name="normal"} 0.0
layer_preempt{layer_name="batch"} 0.0
layer_preempt{layer_name="immediate"} 0.0
# HELP layer_affn_viol % of scheduling events that violated configured policies due to CPU affinity restrictions.
# TYPE layer_affn_viol gauge
layer_affn_viol{layer_name="normal"} 2.950379973178364
layer_affn_viol{layer_name="batch"} 0.0
layer_affn_viol{layer_name="immediate"} 0.0
# HELP layer_cur_nr_cpus Current # of CPUs assigned to the layer.
# TYPE layer_cur_nr_cpus gauge
layer_cur_nr_cpus{layer_name="normal"} 50
layer_cur_nr_cpus{layer_name="batch"} 2
layer_cur_nr_cpus{layer_name="immediate"} 50
# HELP layer_min_nr_cpus Minimum # of CPUs assigned to the layer.
# TYPE layer_min_nr_cpus gauge
layer_min_nr_cpus{layer_name="normal"} 0
layer_min_nr_cpus{layer_name="batch"} 0
layer_min_nr_cpus{layer_name="immediate"} 0
# HELP layer_max_nr_cpus Maximum # of CPUs assigned to the layer.
# TYPE layer_max_nr_cpus gauge
layer_max_nr_cpus{layer_name="immediate"} 50
layer_max_nr_cpus{layer_name="normal"} 50
layer_max_nr_cpus{layer_name="batch"} 2
# EOF
^C19:40:11 [INFO] EXIT: BPF scheduler unregistered
```
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
After updates to reflect the updated init and direct dispatch API, the
schedulers aren't compatible with older kernels. Bump versions and publish
releases.
In the latest kernel, sched_ext API has changed in two areas:
- ops.prep_enable/cancel_enable/enable/disable() replaced with
ops.init_task/enable/disable/exit_task().
- scx_bpf_dispatch() can now be called from ops.select_cpu(). Also,
SCX_ENQ_LOCAL flag is removed. Instead, users can call
scx_bpf_select_cpu_dfl() from ops.select_cpu() and use the @is_idle out
param value to determine whether to dispatch directly.
This commit updates all schedules so that they build.
- Init functions renamed / merged / split.
- ops.select_cpu() is added to several schedulers and local direct
disptching logic is moved there.
This is the minimum update which is need to make the schedulers build and
work. It needs further update to e.g. move vtime udpates to ops.enable().
This because each scheduler has it's own Rust Crate
and it's better if they had a README associated with each one.
https://crates.io/crates/scx_layered
- combine c and kernel-examples as it's confusing to have both
- rename 'rust-user' and 'c-user' to just 'rust' and 'c', which is simpler
- update and fix sync-to-kernel.sh