Systemd provides an option for allocating DynamicUsers
which we want to use in NixOS to harden service configuration.
However, we discovered that the user wasn't allocated properly
for services. After some digging this turned out to be, of course,
a cache inconsistency problem.
When a DynamicUser creation is performed, Systemd check beforehand
whether the requested user already exists statically. If it does,
it bails out. If it doesn't, systemd continues with allocating the
user.
However, by checking whether the user exists, nscd will store
the fact that the user does not exist in it's negative cache.
When the service tries to lookup what user is associated to its
uid (By calling whoami, for example), it will try to consult
libnss_systemd.so However this will read from the cache and tell
report that the user doesn't exist, and thus will return that
there is no user associated with the uid. It will continue
to do so for the cache duration time. If the service
doesn't immediately looks up its username, this bug is not
triggered, as the cache will be invalidated around this time.
However, if the service is quick enough, it might end up
in a situation where it's incorrectly reported that the
user doesn't exist.
Preferably, we would not be using nscd at all. But we need to
use it because glibc reads nss modules from /etc/nsswitch.conf
by looking relative to the global LD_LIBRARY_PATH. Because LD_LIBRARY_PATH
is not set globally (as that would lead to impurities and ABI issues),
glibc will fail to find any nss modules.
Instead, as a hack, we start up nscd with LD_LIBRARY_PATH set
for only that service. Glibc will forward all nss syscalls to
nscd, which will then respect the LD_LIBRARY_PATH and only
read from locations specified in the NixOS config.
we can load nss modules in a pure fashion.
However, I think by accident, we just copied over the default
settings of nscd, which actually caches user and group lookups.
We already disable this when sssd is enabled, as this interferes
with the correct working of libnss_sss.so as it already
does its own caching of LDAP requests.
(See https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/usingnscd-sssd)
Because nscd caching is now also interferring with libnss_systemd.so
and probably also with other nsss modules, lets just pre-emptively
disable caching for now for all options related to users and groups,
but keep it for caching hosts ans services lookups.
Note that we can not just put in /etc/nscd.conf:
enable-cache passwd no
As this will actually cause glibc to _not_ forward the call to nscd
at all, and thus never reach the nss modules. Instead we set
the negative and positive cache ttls to 0 seconds as a workaround.
This way, Glibc will always forward requests to nscd, but results
will never be cached.
Fixes#50273
Right now it's not at all obvious that one can override this option
using `services.logind.extraConfig`; we might as well add an option
for `killUserProcesses` directly so it's clear and documented.
They consistently fail since openjdk bump with some out-of-space errors.
That's not a problem by itself, but each test instance ties a build slot
for many hours and consequently they also delay channels as those wait
for all builds to finish.
Feel free to re-enable when fixed, of course.
The test now runs wayland, which means we can no longer use X11 style testing.
Instead we get gnome shell to execute javascript through its dbus interface.
Since 83b27f60ce, the tests were moved
into all-tests.nix and some of the tooling has changed so that
subattributes of test expressions are now recursively evaluated until a
derivation with a .test attribute has been found.
Unfortunately this isn't the case for all of the tests and the
runInMachine doesn't use the makeTest function other tests are using but
instead uses runInMachine, which doesn't generate a .test attribute.
Whener a .test attribute wasn't found by the new handleTest function, it
recurses down again until there is no value left that is an attribute
set and subsequently returns its unchanged value. This however has the
drawback that instead of getting different attributes for each
architecture we only get the last architecture in the supportedSystems
list.
In the case of the release.nix, the last architecture in
supportedSystems is "aarch64-linux", so the runInMachine test is always
built on that architecture.
In order to work around this, I changed runInMachine to emit a .test
attribute so that it looks to handleTest like it was a test created via
makeTest.
Signed-off-by: aszlig <aszlig@nix.build>
Docker images used to be, essentially, a linked list of layers. Each
layer would have a tarball and a json document pointing to its parent,
and the image pointed to the top layer:
imageA ----> layerA
|
v
layerB
|
v
layerC
The current image spec changed this format to where the Image defined
the order and set of layers:
imageA ---> layerA
|--> layerB
`--> layerC
For backwards compatibility, docker produces images which follow both
specs: layers point to parents, and images also point to the entire
list:
imageA ---> layerA
| |
| v
|--> layerB
| |
| v
`--> layerC
This is nice for tooling which supported the older version and never
updated to support the newer format.
Our `buildImage` code only supported the old version, so in order for
`buildImage` to properly generate an image based on another image
with `fromImage`, the parent image's layers must fully support the old
mechanism.
This is not a problem in general, but is a problem with
`buildLayeredImage`.
`buildLayeredImage` creates images with newer image spec, because
individual store paths don't have a guaranteed parent layer. Including
a specific parent ID in the layer's json makes the output less likely
to cache hit when published or pulled.
This means until now, `buildLayeredImage` could not be the input to
`buildImage`.
The changes in this PR change `buildImage` to only use the layer's
manifest when locating parent IDs. This does break buildImage on
extremely old Docker images, though I do wonder how many of these
exist.
This work has been sponsored by Target.
Instead of setting User/Group only when DynamicUser is disabled, the
previous version of the code set it only when it was enabled. This
caused services with DynamicUser enabled to actually run as nobody, and
services without DynamicUser enabled to run as root.
Regression from fbb7e0c82f.
GitLab 11.5.1 dropped the dependency to posix_spawn, which is broken on
32bit. (See https://gitlab.com/gitlab-org/gitlab-ce/issues/53525)
The only part missing is decreasing virtualisation.memorySize to
something that a 32 bit qemu still executes.
The maximum seems to be 2047, and tests passed with that value for me.
This cleans up the CockroachDB expression, with a few suggestions from
@aszlig.
However, it brought up the note of using systemd's StateDirectory=
directive, which is a nice feature for managing long-term data files,
especially for UID/GID assigned services. However, it can only manage
directories under /var/lib (for global services), so it has to introduce
a special path to make use of it at all in the case someone wants a path
at a different root.
While the dataDir directive at the NixOS level is _occasionally_ useful,
I've gone ahead and removed it for now, as this expression is so new,
and it makes the expression cleaner, while other kinks can be worked out
and people can test drive it.
CockroachDB's dataDir directive, instead, has been replaced with
systemd's StateDirectory management to place the data under
/var/lib/cockroachdb for all uses.
There's an included RequiresMountsFor= clause like usual though, so if
people want dependencies for any kind of mounted device at boot
time/before database startup, it's easy to specify using their own
mount/filesystems clause.
This can also be reverted if necessary, but, we can see if anyone ever
actually wants that later on before doing it -- it's a backwards
compatible change, anyway.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Currently there are two calls to curl in the reloadScript, neither which
check for errors. If something is misconfigured (like wrong authToken),
the only trace that something wrong happened is this log message:
Asking Jenkins to reload config
<h1>Bad Message 400</h1><pre>reason: Illegal character VCHAR='<'</pre>
The service isn't marked as failed, so it's easy to miss.
Fix it by passing --fail to curl.
While at it:
* Add $curl_opts and $jenkins_url variables to keep the curl command
lines DRY.
* Add --show-error to curl to show short error message explanation when
things go wrong (like HTTP 401 error).
* Lower-case the $CRUMB variable as upper case is for exported environment
variables.
The new behaviour, when having wrong accessToken:
Asking Jenkins to reload config
curl: (22) The requested URL returned error: 401
And the service is clearly marked as failed in `systemctl --failed`.
When privateNetwork is enabled, currently the container's interface name
is derived from the container name. However, there's a hard limit
on the size of interface names. To avoid conflicts and other issues,
we set a limit on the container name when privateNetwork is enabled.
Fixes#38509
This also includes a full end-to-end CockroachDB clustering test to
ensure everything basically works. However, this test is not currently
enabled by default, though it can be run manually. See the included
comments in the test for more information.
Closes#51306. Closes#38665.
Co-authored-by: Austin Seipp <aseipp@pobox.com>
Signed-off-by: Austin Seipp <aseipp@pobox.com>
The new reuse behaviour is cool and really useful but it breaks one of
my use cases. When using kexec, I have a script which will unlock the
disks in my initrd. However, do_open_passphrase will fail if the disk is
already unlocked.
According to the dbus-launch documentation [0] "--exit-with-session"
shouldn't be used: "This option is not recommended, since it will
consume input from the terminal where it was started; it is mainly
provided for backwards compatibility." And it also states: "To start a
D-Bus session within a text-mode session, do not use dbus-launch.
Instead, see dbus-run-session(1)."
The new wrapper also avoids starting an additional D-Bus session if
DBUS_SESSION_BUS_ADDRESS is already set.
Fix#51303.
[0]: https://dbus.freedesktop.org/doc/dbus-launch.1.html
[1]: https://dbus.freedesktop.org/doc/dbus-run-session.1.html
As the comment notes, restarts/exits of dhcpcd generally require
restarting the NTP service since, if name resolution fails for a pool of
servers, the service might break itself. To be on the safe side, try
restarting Chrony in these instances, too.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Setting the server list to be empty is useful e.g. for hardware-only
or virtualized reference clocks that are passed through to the system
directly. In this case, initstepslew has no effect, so don't emit it.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
This is to try and squeeze more lost space from the image, so that hydra
starts building it again.
The fsck previous to the resize2fs is required so resize2fs works.
The one afterwards is a sanity check.
Using `-M` from resize2fs will not give much saved space due to a known
(in the manual) issue.
```
[samueldr@aarch64:~/nixpkgs]$ ls -lh result-*/*/*.img
-r--r--r-- 1 root root 2.2G Jan 1 1970 result-original/sd-image/nixos-sd-image-18.09.git.a7fd431-aarch64-linux.img
-r--r--r-- 1 root root 2.1G Jan 1 1970 result-M/sd-image/nixos-sd-image-18.09.git.a7fd431-aarch64-linux.img
-r--r--r-- 1 root root 1.9G Jan 1 1970 result-slimmed/sd-image/nixos-sd-image-18.09.git.a7fd431-aarch64-linux.img
```
```
[samueldr@aarch64:~/nixpkgs]$ nix path-info -S ./result-original
/nix/store/c8k9n78gylx293rjh762fr05a069kxp2-nixos-sd-image-18.09.git.a7fd431-aarch64-linux.img 3844125000
[samueldr@aarch64:~/nixpkgs]$ nix path-info -S ./result-slimmed
/nix/store/962238skj5mnzhrsmjy23dyzmxk77sp4-nixos-sd-image-18.09.git.a7fd431-aarch64-linux.img 3447473208
```