LAVD_VDL_LOOSENESS_FT represents how loose the deadline is. The smaller
value means the deadline is tighter. While it is unlikely to be tuned,
let's keep it as a tunable for now.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
That is okay since the runtime is considered in calculating a virtual
deadline. A shorter runtime will result in a tighter deadline linearly.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
If inheriting the parent's properties, a new fork task tends to be too
prioritized. That is, many parent processes, such as `make,` are a bit
more latency-critical than average.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Advancing the clock slower when overloaded gives more opportunities for
latency-critical tasks to cut in the run queue. Controlling the clock
better reflects the actual load than the prior approach of stretching
the time-space when overloaded.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
We now maintain two run queues—an eligible run queue (DSQ) and an
ineligible run queue (rbtree)—sorted by the task's virtual deadline.
When the eligible run queue is empty, or the ineligible run queue has
not been consumed for too long (e.g., 15 msec), a task in the ineligible
run queue is moved to the eligible run queue for execution. With these
two queues, we have a better admission control.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Estimating the service time from run time and frequency is not
incorrect. However, it reacts slowly to sudden changes since it relies
on the moving average. Hence, we directly measure the service time to
enforce fairness.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
We always use nr_cpu_ids to represent the maximum CPU id returned by
scx_bpf_nr_cpu_ids().
Replace cpu_max with nr_cpu_ids to be more consistent with the rest of
the code.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
The kernel from kernel.org lacks the necessary .config chunk required
by the build via virtme-ng.
To fix this, include the kernel .config chunk directly in the scx
repository and use it from here.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
We can rely on scx_bpf_nr_cpu_ids() to create all the possible per-CPU
DSQs, eliminating the need for the hard-coded limit MAX_CPUS.
In this way scx_bpfland can support the same amount of CPUs that the
kernel can handle.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Instead of constantly checking the need to drain tasks from the DSQs of
the offline CPUs, provide an atomic flag to notify when there are tasks
to be drained from the offline CPUs.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Refine the safeguard mechanism to avoid generating too many interactive
tasks in the system, which could nullify the effect of the
interactive/regular task classification.
The safeguard mechanism operates by pausing the promotion of new tasks
to interactive status during the task wake-up process, whenever the
number of interactive tasks in the priority queue exceeds a specific
limit (set to 4x the number of online CPUs).
Halting the promotion of additional interactive tasks allows to
prioritize those already classified as interactive, thereby preventing
potential "bursts" of excessive interactive tasks in the system.
This refines the mitigation already provided by commit 640bd562
("scx_bpfland: prevent tasks from abusing interactive priority boost").
Fixes: 640bd562 ("scx_bpfland: prevent tasks from abusing interactive priority boost")
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Always assign the maximum time slice if there are idle CPUs in the
system.
Otherwise, double the task's unused time slice to reward tasks that use
less CPU time and at the same time refill the time slice of the tasks
every time they're dispatched.
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
sched_ext is about to be merged upstream. There are some compatibility
breaking changes and we're making the current sched_ext/for-6.11
1edab907b57d ("sched_ext/scx_qmap: Pick idle CPU for direct dispatch on
!wakeup enqueues") the baseline.
Tag everything except scx_mitosis as 1.0.0. As scx_mitosis is still in early
development and is currently temporarily disabled, only the patchlevel is
bumped.
Sync to vmlinux.h from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap:
Pick idle CPU for direct dispatch on !wakeup enqueues"). This most likely
will be the commit which will be merged during the upcoming kernel v6.11
merge window.
Unfortunately, this is a compatibility breaking change. As the size of
bpf_iter_scx_dsq is reduced, schedulers that use the iterator - scx_lavd and
scx_layered - won't be able to run on older kernels. Likewise, older
binaries from before this commit won't be able to run on newer kernels.
Sync from sched_ext/for-6.11 1edab907b57d ("sched_ext/scx_qmap: Pick idle
CPU for direct dispatch on !wakeup enqueues")
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git for-6.11
- cgroup support hasn't landed in the upstream kernel yet. This most likely
will happen in a few weeks. For the time being, disable scx_flatcg,
scx_pair and scx_mitosis.
- Compat macro for DSQ task iterator dropped. This is now a part of
the baseline.
- scx_bpf_consume() isn't upstream yet. BPF interfacing side is still being
discussed. Dropped example usage from tools/sched_ext. None of the
practical schedulers use it, so this should be fine for now.
- scx_bpf_cpu_rq() added.
- AUTOATTACH workaround for newer libbpf versions added.
Sync to the latest. Right now, it's in an awkward place where receiving
AUTOATTACH compat updatees from kernel breaks build as the libbpf version is
marked as 1.5 but bpf_map__set_autoattach() is not available yet. Sync to
the latest.
A task can become a runnable on any task's context not only its waker
task. Thus, we should not count wake-up on unrelated task's context.
With this commit, the scheduler can (much more) accurately detect
waker-wakee relationsships.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
The prior approach using the sum of weights gives too much penalty to
nice tasks with large nice values. With this commit, the time slice is
determined by the number of runnable tasks regardless of nice priority.
Note that the fairness will still be enforced based on tasks' nice
priorities (weights).
Signed-off-by: Changwoo Min <changwoo@igalia.com>
To easily distinguish, let's initialize the current logical clock to
zero (not the current physical time). Also, avoid the deadline
calculation being zero by adding +1 here and there.
Signed-off-by: Changwoo Min <changwoo@igalia.com>