Done by setting `autopilot.min_quorum = 3`.
Techncially, this would have been required to keep the test correct since
Consul's "autopilot" "Dead Server Cleanup" was enabled by default (I believe
that was in Consul 0.8). Practically, the issue only occurred with our NixOS
test with releases >= `1.7.0-beta2` (see #90613). The setting itself is
available since Consul 1.6.2.
However, this setting was not documented clearly enough for anybody to notice,
and only the upstream issue https://github.com/hashicorp/consul/issues/8118
I filed brought that to light.
As explained there, the test could also have been made pass by applying the
more correct rolling reboot procedure
-m.wait_until_succeeds("[ $(consul members | grep -o alive | wc -l) == 5 ]")
+m.wait_until_succeeds(
+ "[ $(consul operator raft list-peers | grep true | wc -l) == 3 ]"
+)
but we also intend to test that Consul can regain consensus even if
the quorum gets temporarily broken.