drgn/.travis.yml
Omar Sandoval be85631471 travis: fix spurious VM crashes
Every few builds or so, a vmtest VM crashes after printing "x86: Booting
SMP configuration:". After some difficult debugging, I determined that
the crash happens in arch/x86/realmode/rm/trampoline_64.S (the code that
initializes secondary CPUs) at the ljmp from startup_32 to startup_64.
The real problem happens earlier in startup_32:

	movl	$pa_trampoline_pgd, %eax
	movl	%eax, %cr3

Sometimes, the store to CR3 "fails" and CR3 remains zero, which causes
the later ljmp to triple fault.

This can be reproduced by the following script:

	#!/bin/sh

	curl -L 'https://www.dropbox.com/sh/2mcf2xvg319qdaw/AABFKsISWRpndNZ1gz60O-qSa/x86_64/vmlinuz-5.8.0-rc7-vmtest1?dl=1' -o vmlinuz

	cat > commands.gdb << "EOF"
	set confirm off
	target remote :1234

	# arch/x86/realmode/rm/trampoline_64.S:startup_32 after CR3 store.
	hbreak *0x9ae09 if $cr3 == 0
	command
	info registers eax cr3
	quit 1
	end

	# kernel/smp.c:smp_init() after all CPUs have been brought up. If we get here,
	# the bug wasn't triggered.
	hbreak *0xffffffff81ed4484
	command
	kill
	quit 0
	end

	continue
	EOF

	while true; do
		qemu-system-x86_64 -cpu host -enable-kvm -smp 64 -m 128M \
			-nodefaults -display none -serial file:/dev/stdout -no-reboot \
			-kernel vmlinuz -append 'console=0,115200 panic=-1 nokaslr' \
			-s -S &

		gdb -batch -x commands.gdb || exit 1
	done

This seems to be a problem with nested virtualization that was fixed by
Linux kernel commit b4d185175bc1 ("KVM: VMX: give unrestricted guest
full control of CR3") (in v4.17). Apparently, the Google Cloud hosts
that Travis runs on are missing this fix. We obviously can't patch those
hosts, but we can work around it. Disabling unrestricted guest support
in the Travis VM causes CR3 stores in the nested vmtest VM to be
emulated, bypassing the bug.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
2020-08-04 16:36:09 -07:00

38 lines
1.3 KiB
YAML

dist: bionic
language: python
python:
- '3.8'
- '3.7'
- '3.6'
install:
# If the host is running a kernel without Linux kernel commit b4d185175bc1
# ("KVM: VMX: give unrestricted guest full control of CR3") (in v4.17), then
# stores to CR3 in the nested guest can spuriously fail and cause it to
# crash. We can work around this by disabling unrestricted guest support.
- |
if grep -q '^flags\b.*\bvmx\b' /proc/cpuinfo; then
echo "options kvm_intel unrestricted_guest=N" | sudo tee /etc/modprobe.d/kvm-cr3-workaround.conf > /dev/null
sudo modprobe -r kvm_intel
sudo modprobe kvm_intel
fi
# Upstream defaults to world-read-writeable /dev/kvm. Debian/Ubuntu override
# this; see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892945. We want
# the upstream default.
- echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /lib/udev/rules.d/99-fix-kvm.rules > /dev/null
- sudo udevadm control --reload-rules
# On systemd >= 238 we can use udevadm trigger -w and remove udevadm settle.
- sudo udevadm trigger /dev/kvm
- sudo udevadm settle
script: python setup.py test -K
addons:
apt:
packages:
- busybox-static
- libbz2-dev
- liblzma-dev
- qemu-kvm
- zlib1g-dev
- zstd