scx_mitosis relied on the implicit assumption that after a sched tick,
all outstanding scheduling events had completed but this might not
actually be correct. This feels like a natural use-case for RCU, but
there is no way to directly make use of RCU in BPF. Instead, this commit
implements an RCU-like synchronization mechanism.
Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
Schedule all tasks using a single global DSQ. This gives a better
control to prevent potential starvation conditions.
With this change, scx_bpfland adopts a logic similar to scx_rusty and
scx_lavd, prioritizing tasks based on the frequency of their wait and
wake-up events, rather than relying exclusively on the average amount of
voluntary context switches.
Tasks are still classified as interactive / non-interactive based on the
amount of voluntary context switches, but this is only affecting the
cpufreq logic.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Since tasks' average runtimes show skewed distribution, directly
using the runtime in the deadline calculation causes several
performance regressions. Instead, let's use the constant factor
and further prioritize frequency factors to deprioritize the long
runtime tasks.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Revert the change of sending self-IPI at preemption when a victim
CPU is the current CPU. The cost of self-IPI is prohibitively expensive
in some workloads (e.g., perf bench). Instead, resetting task' time slice
to zero.
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Rather then always migrating tasks across LLC domains when no idle CPU
is available in their current LLC domain, allow migration but attempt to
bring tasks back to their original LLC domain whenever possible.
To do so, define the task's scheduling domain upon task creation or when
its affinity changes, and ensure the task remains within this domain
throughout its lifetime.
In the future we will add a proper load balancing logic, but for now
this change seems to provide consistent performance improvement in
certain server workloads.
For example, simple CUDA benchmarks show a performance boost of about
+10-20% with this change applied (on multi-LLC / NUMA machines).
Signed-off-by: Andrea Righi <arighi@nvidia.com>
This allows to prevent excessive starvation of regular tasks in presence
of high amount of interactive tasks (e.g., when running stress tests,
such as hackbench).
Signed-off-by: Andrea Righi <arighi@nvidia.com>
This can lead to stalls when a high number of interactive tasks are
running in the system (i.e.., hackbench or similar stress tests).
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Add SCX_OPS_ENQ_EXITING to the scheduler flags, since we are not using
bpf_task_from_pid() and the scheduler can handle exiting tasks.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Ensure that task vruntime is always updated in ops.running() to maintain
consistency with other schedulers.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Fix task filtering logic error to avoid the possibility of migrate the
same task over again. The orginal logic operation was "||" which might
include tasks already migrated to be taken into consideration again.
Change the condition to "&&" so we can elimate the error.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Inside the function "try_find_move_task()", it returns directly when
there's no task found to be moved. If the cause is from lack of ability
to fulfilled the condition by "task_filter()", load balancer will try to
find move task again and remove "task_filter()" by setting it directly
to a function returns true.
However, in the fallback case, the tasks within the domains will be
empty. Swap the tasks back into domains vector before returning can
solve the issue.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
The combination of kernel versions and kerenl configs generates
different kernel symbols. For example, in an old kernel version,
__mutex_lock() is not generated. Also, there is no workaround
from the fentry/fexit/kprobe side currently. Let's entirely drop
the kernel locking for now and revisit it later.
Signed-off-by: Changwoo Min <changwoo@igalia.com>