nixpkgs/nixos/modules/system
Maximilian Bosch da905d4cf9
nixos/stage-1: fix modprobe in initial ramdisk on systems w/glibc-2.34
This effectively fixes the majority of all VM tests which were broken
because `/dev/vda` (or any other block device) wasn't mountable:

      machine # mounting /dev/vda on /...
      machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device[    2.820976] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
      machine # [    2.821757] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS
      machine # [    2.821757] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      machine # [    2.821757] Call Trace:
      machine # [    2.821757]  dump_stack+0x6b/0x83
      machine # [    2.821757]  panic+0x101/0x2c8
      machine # [    2.821757]  do_exit.cold+0x14/0xb3
      machine # [    2.821757]  do_group_exit+0x33/0xa0
      machine # [    2.821757]  __x64_sys_exit_group+0x14/0x20
      machine # [    2.821757]  do_syscall_64+0x33/0x40
      machine # [    2.821757]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      machine # [    2.821757] RIP: 0033:0x7f67ec2800f6
      machine # [    2.821757] Code: 00 4c 8b 0d 2c 5d 11 00 eb 19 66 2e 0f 1f 84 00 00 00 00 00 89 d7 89 f0 0f 05 48 3d 00 f0 ff ff 77 22 f4 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff ff 76 e2 f7 d8 64 41 89 01 eb da 66 2e 0f 1f 84 00
      machine # [    2.821757] RSP: 002b:00007fff8f5a71d8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      machine # [    2.821757] RAX: ffffffffffffffda RBX: 0000000000699704 RCX: 00007f67ec2800f6
      machine # [    2.821757] RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
      machine # [    2.821757] RBP: 0000000000000004 R08: 00000000000000e7 R09: ffffffffffffff80
      machine # [    2.821757] R10: 00007f67ec33f3e0 R11: 0000000000000202 R12: 000000000000000b
      machine # [    2.821757] R13: 00007fff8f5a75a8 R14: 0000000000000000 R15: 00000000004fc198
      machine # [    2.821757] Kernel Offset: 0x31e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      machine # [    2.821757] Rebooting in 1 seconds..

This happened because the kernel failed to load modules such as `ext4`
from `boot.initrd.availableKernelModules`[1] on e.g. a `mount(2)` syscall.

The problem is that `kmod` isn't linked against `libpthread.so.0`
anymore because it got merged into `libc.so.6` (however, the .so still
exists), but still needs it:

      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/x86_64", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # newfstatat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/lib", 0x7ffd951114c0, 0) = -1 ENOENT (No such file or directory)
      machine # openat(AT_FDCWD, "/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-glibc-2.34-36/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
      machine # writev(2, [{iov_base="/nix/store/kdc9n48ksdc1a8y8w512w"..., iov_len=69}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libra"..., iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="libpthread.so.0", iov_len=15}, {iov_base=": ", iov_len=2}, {iov_base="cy
      machine # ) = 184
      machine # exit_group(127)                         = ?
      machine # +++ exited with 127 +++
      machine # mount: mounting /dev/vda on /mnt-root/ failed: No such device
      machine # [   19.167180] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
      machine # [   19.167711] CPU: 0 PID: 1 Comm: init Not tainted 5.10.72 #1-NixOS

This is not a problem

* inside stage-1 because `LD_LIBRARY_PATH` points to `$out/lib` of
  extra-utils where `libpthread.so.6` also exists.
* on a running system because `${pkgs.glibc}/lib` is part of kmod's
  rpath.

However this is a problem inside the kernel which calls `modprobe` (in
our case `kmod`) to load modules and doesn't know about
`LD_LIBRARY_PATH`. Also, the rpath-reference was nuked.

To work around this, the kernel's `modprobe`
(i.e. `/proc/sys/kernel/modprobe`) now points to a wrapper which
explicitly declares `LD_LIBRARY_PATH`. We can't use `makeWrapper` here
because `modprobe` itself must not be renamed. Otherwise, `kmod` (which
is the link-target of `modprobe`) won't work because it expects
`argv[0] == "modprobe"` to perform modprobe's tasks.

[1] https://nixos.org/manual/nixos/stable/options.html#opt-boot.initrd.availableKernelModules
2022-02-27 10:26:51 +01:00
..
activation nixos/switch-to-configuration: Fix backslashes in unit names 2022-02-17 12:49:45 +01:00
boot nixos/stage-1: fix modprobe in initial ramdisk on systems w/glibc-2.34 2022-02-27 10:26:51 +01:00
etc nixos/etc.nix: Make independent 2022-01-30 09:01:27 +01:00
build.nix nixos: Make system.build a submodule with freeformType 2022-01-24 00:52:46 +01:00