The build error has been introduced by 56dcc319cf.
Using a <simplesect/> within a <para/> is not allowed and subsequently
fails to validate while building the manual.
So instead, I moved the <simplesect/> further down and outside of the
<para/> to fix this.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @aaronjanse, @Lassulus, @danbst
The default config of i3 provides a key binding to reload, so changes
take effect immediately:
```
bindsym $mod+Shift+c reload
```
Unfortunately the current module uses the store path of the `configFile`
option. So when I change the config in NixOS, a store path will be
created, but the current i3 process will continue to use the old one,
hence a restart of i3 is required currently.
This change links the config to `/etc/i3/config` and alters the X
startup script accordingly so after each rebuild, the config can be
reloaded.
This allows configuring IP addresses on a tinc interface using
networking.interfaces."tinc.${n}".ipv[46].addresses.
Previously, this would fail with timeouts, because of the dependency
chain
tinc.${netname}.service
--after--> network.target
--after--> network-addresses-tinc.${n}.service (and network-link-…)
--after--> sys-subsystem-net-devices-tinc.${n}.device
But the network interface doesn't exist until tinc creates it! So
systemd waits in vain for the interface to appear, and by then the
network-addresses-* and network-link-* units have failed. This leads
to the network link not being brought up and the network addresses not
being assigned, which in turn stops tinc from actually working.
cross-compilation of `btrfs-tools` is broken, and this usually needless dependency of each system closure on `btrfs-tools` prevents cross-compilation of whole system closures
Ideally, private keys never leave the host they're generated on - like
SSH. Setting generatePrivateKeyFile to true causes the PK to be
generate automatically.
Some ACME clients do not generate full.pem, which is the same as
fullchain.pem + the certificate key (key.pem), which is not necessary
for verifying OCSP staples.
I have a nixops network where I deploy containers using the `container`
backend which uses `nixos-container` intenrally to deploy several
containers to a certain host.
During that time I removed and added new containers and while trying to
deploy those to a different host I realized that it isn't guaranteed
that each container gets the same IP address which is a problem as some
parts of the deployment need to know which container is using which IP
(i.e. to configure port forwarding on the host).
With this change you can specify the container's IP like this (and don't
have to use the arbitrarily used 10.233.0.0/16 subnet):
```
$ nixos-container create test --config-file test-container.nix \
--local-address 10.235.1.2 --host-address 10.235.1.1
```
This is an implementation of wireguard support using wg-quick config
generation.
This seems preferrable to the existing wireguard support because
it handles many more routing and resolvconf edge cases than the
current wireguard support.
It also includes work-arounds to make key files work.
This has one quirk:
We need to set reverse path checking in the firewall to false because
it interferes with the way wg-quick sets up its routing.
This is to make sure that we get different ETag values whenever we
switch to a different store path but with the same file contents.
I've checked this against the old behaviour without the patch and it
fails as expected.
Signed-off-by: aszlig <aszlig@nix.build>
Copy-paste from iso-image.nix
Besides the simplification, it should use `pkgs.buildPackages.squashfsTools` because it is used in `nativeBuildInputs` instead of incorrect `pkgs.squashfsTools` which was forced by `import'
This makes sure that when a user hasn't set a Prometheus option it
won't show up in the prometheus.yml configuration file. This results
in smaller and easier to understand configuration files.
We previously filtered out the `_module` attribute in a NixOS
configuration by filtering it using the option's `apply` function.
This meant that every option that had a submodule type needed to have
this apply function. Adding this function is easy to forget thus this
mechanism is error prone.
We now recursively filter out the `_module` attributes at the place we
construct the Prometheus configuration file. Since we now do the filtering
centrally we don't have to do it per option making it less prone to errors.
This results in a smaller prometheus.yml config file.
It also allows us to use the same options for both prometheus-1 and
prometheus-2 since the new options for prometheus-2 default to null
and will be filtered out if they are not set.
From gkd-capability.c:
This program needs the CAP_IPC_LOCK posix capability.
We want to allow either setuid root or file system based capabilies
to work. If file system based capabilities, this is a no-op unless
the root user is running the program. In that case we just drop
capabilities down to IPC_LOCK. If we are setuid root, then change to the
invoking user retaining just the IPC_LOCK capability. The application
is aborted if for any reason we are unable to drop privileges.
Get these from upstream tox-node package instead.
This is likely to cause less maintenance overhead over time and
following upstream bootstrap node changes is automated.
This adds the following new packages:
+ elasticsearch7
+ elasticsearch7-oss
+ logstash7
+ logstash7-oss
+ kibana7
+ kibana7-oss
+ filebeat7
+ heartbeat7
+ metricbeat7
+ packetbeat7
+ journalbeat7
The default major version of the ELK stack stays at 6. We should
probably set it to 7 in a next commit.
Same problem as described in acbadcdbba.
When using multiple interfaces for wifi with `networking.wlanInterfaces`
and the interface for `hostapd` contains a dash, this will fail as
systemd escapes dashes in its device names.
The manpage claims that the "limit" in the setting::
<name>:[<limit>:]<regex>
is optional and defaults to zero, implying no limit.
However, tests confirmed that it actually isn't optional.
Without limit, the setting ``any:.*`` places
outbound jobs on infinite hold if no particular
modem was specified on the sendfax command line.
The new default value ``any:0:.*`` from
this commit uses any available modem to
send jobs if not modem was given to sendfax.
This results in a simpler service unit which doesn't first have to
start a shell:
> cat /nix/store/s95nsr8zbkblklanqpkiap49mkwbaq45-unit-alertmanager.service/alertmanager.service
...
ExecStart=/nix/store/4g784lwcy7kp69hg0z2hfwkhjp2914lr-alertmanager-0.16.2-bin/bin/alertmanager \
--config.file /nix/store/p2c7fyi2jkkwq04z2flk84q4wyj2ggry-checked-config \
--web.listen-address [::1]:9093 \
--log.level warn
...
Commit 29d7d8f44d has introduced another
section with the ID "sec-release-19.09-incompatibilities", which
subsequently causes the build to fail.
I just merged both sections and the manual is now building again.
Signed-off-by: aszlig <aszlig@nix.build>
Documize is an open-source alternative for wiki software like Confluence
based on Go and EmberJS. This patch adds the sources for the community
edition[1], for commercial their paid-plan[2] needs to be used.
For commercial use a derivation that bundles the commercial package and
contains a `$out/bin/documize` can be passed to
`services.documize.enable`.
The package compiles the Go sources, the build process also bundles the
pre-built frontend from `gui/public` into the binary.
The NixOS module generates a simple `systemd` unit which starts the
service as a dynamic user, database and a reverse proxy won't be
configured.
[1] https://www.documize.com/get-started/
[2] https://www.documize.com/pricing/
Before this change, only passwords not containing shell metacharacters could be
used, and because the password was passed as a command-line argument, local
users could (in a very small window of time) record the password and (in an
indefinity window of time) record the length of the password.
We also use the opportunity to add a call to `exec` in the systemd start
script, so that no shell needs to hang around waiting for iodine to stop.
`phpPackage` is 7.3 by default, but `pkgs.php` is 7.2,
so this saves the need for an extra copy of php
for the purpose of running nextcloud's cron;
more importantly this fixes problems with extensions
not loading since they are built against a different php.
Since the switch to check the nginx config with gixy in
59fac1a6d7, the ACME test doesn't build
anymore, because gixy reports the following false-positive (reindented):
>> Problem: [alias_traversal] Path traversal via misconfigured alias.
Severity: MEDIUM
Description: Using alias in a prefixed location that doesn't ends with
directory separator could lead to path traversal
vulnerability.
Additional info: https://github.com/yandex/gixy/blob/master/docs/en/plugins/aliastraversal.md
Pseudo config:
server {
server_name letsencrypt.org;
location /documents/2017.11.15-LE-SA-v1.2.pdf {
alias /nix/store/y4h5ryvnvxkajkmqxyxsk7qpv7bl3vq7-2017.11.15-LE-SA-v1.2.pdf;
}
}
The reason this is a false-positive is because the destination is not a
directory, so something like "/foo.pdf../other.txt" won't work here,
because the resulting path would be ".../destfile.pdf../other.txt".
Nevertheless it's a good idea to use the exact match operator (=), to
not only shut up gixy but also gain a bit of performance in lookup (not
that it would matter in our test).
Signed-off-by: aszlig <aszlig@nix.build>
With CUPS v2.3b5, the configuration directive `SetEnv`
moved from `cupsd.conf` to `cups-files.conf`. See also
d47f6aec43 .
We have to follow up as `SetEnv` is now ignored in `cupsd.conf`.
Without this, executables called by cups
can't find other executables they depend on,
like `gs` or `perl`.
When using a different database, the evaluation fails as
`config.services.postgresql.package` is only set if `services.postgresql` is enabled.
Also, the systemd service shouldn't have a relation to postgres if a
remote database is used.
which was wrongly specified as types.lines
Prevent it from getting copied to nix store as people might use it for
credentials, and make the tests cover it.
Also add back tests, don't seem broken anymore.
This is just fine:
nix-build ./nixos/release.nix -A tests.kafka.kafka_2_1.x86_64-linux -A tests.kafka.kafka_2_2.x86_64-linux
See https://github.com/browserpass/browserpass-native/issues/31
Additionally browserpass was removed from systemPackages, because it
doesn't need to be installed, browsers will get the path to the binary
from the native messaging host JSON.
Currently if you want to properly chroot a systemd service, you could do
it using BindReadOnlyPaths=/nix/store or use a separate derivation which
gathers the runtime closure of the service you want to chroot. The
former is the easier method and there is also a method directly offered
by systemd, called ProtectSystem, which still leaves the whole store
accessible. The latter however is a bit more involved, because you need
to bind-mount each store path of the runtime closure of the service you
want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages.
However, this process is a bit tedious, so the changes here implement
this in a more generic way.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.myservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
confinement.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes script and {pre,post}Start) need to be in the chroot,
it can be specified using the confinement.packages option. By default
(which uses the full-apivfs confinement mode), a user namespace is set
up as well and /proc, /sys and /dev are mounted appropriately.
In addition - and by default - a /bin/sh executable is provided, which
is useful for most programs that use the system() C library call to
execute commands via shell.
Unfortunately, there are a few limitations at the moment. The first
being that DynamicUser doesn't work in conjunction with tmpfs, because
systemd seems to ignore the TemporaryFileSystem option if DynamicUser is
enabled. I started implementing a workaround to do this, but I decided
to not include it as part of this pull request, because it needs a lot
more testing to ensure it's consistent with the behaviour without
DynamicUser.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and doesn't
include/exclude the individual bind mounts or the tmpfs.
A quirk we do have right now is that systemd tries to create a /usr
directory within the chroot, which subsequently fails. Fortunately, this
is just an ugly error and not a hard failure.
The changes also come with a changelog entry for NixOS 19.03, which is
why I asked for a vote of the NixOS 19.03 stable maintainers whether to
include it (I admit it's a bit late a few days before official release,
sorry for that):
@samueldr:
Via pull request comment[1]:
+1 for backporting as this only enhances the feature set of nixos,
and does not (at a glance) change existing behaviours.
Via IRC:
new feature: -1, tests +1, we're at zero, self-contained, with no
global effects without actively using it, +1, I think it's good
@lheckemann:
Via pull request comment[2]:
I'm neutral on backporting. On the one hand, as @samueldr says,
this doesn't change any existing functionality. On the other hand,
it's a new feature and we're well past the feature freeze, which
AFAIU is intended so that new, potentially buggy features aren't
introduced in the "stabilisation period". It is a cool feature
though? :)
A few other people on IRC didn't have opposition either against late
inclusion into NixOS 19.03:
@edolstra: "I'm not against it"
@Infinisil: "+1 from me as well"
@grahamc: "IMO its up to the RMs"
So that makes +1 from @samueldr, 0 from @lheckemann, 0 from @edolstra
and +1 from @Infinisil (even though he's not a release manager) and no
opposition from anyone, which is the reason why I'm merging this right
now.
I also would like to thank @Infinisil, @edolstra and @danbst for their
reviews.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-477322127
[2]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-477548395
eb90d97009 broke nslcd, as /run/nslcd was
created/chowned as root user, while nslcd wants to do parts as nslcd
user.
This commit changes the nslcd to run with the proper uid/gid from the
start (through User= and Group=), so the RuntimeDirectory has proper
permissions, too.
In some cases, secrets are baked into nslcd's config file during startup
(so we don't want to provide it from the store).
This config file is normally hard-wired to /etc/nslcd.conf, but we don't
want to use PermissionsStartOnly anymore (#56265), and activation
scripts are ugly, so redirect /etc/nslcd.conf to /run/nslcd/nslcd.conf,
which now gets provisioned inside ExecStartPre=.
This change requires the files referenced to in
users.ldap.bind.passwordFile and users.ldap.daemon.rootpwmodpwFile to be
readable by the nslcd user (in the non-nslcd case, this was already the
case for users.ldap.bind.passwordFile)
fixes#57783
First of all, the reason I added this to the "highlights" section is
that we want users to be aware of these options, because in the end we
really want to decrease the attack surface of NixOS services and this is
a step towards improving that situation.
The reason why I'm adding this to the changelog of the NixOS 19.03
release instead of 19.09 is that it makes backporting services that use
these options easier. Doing the backport of the confinement module after
the official release would mean that it's not part of the release
announcement and potentially could fall under the radar of most users.
These options and the whole module also do not change anything in
existing services or affect other modules, so they're purely optional.
Adding this "last minute" to the 19.03 release doesn't hurt and is
probably a good preparation for the next months where we hopefully
confine as much services as we can :-)
I also have asked @samueldr and @lheckemann, whether they're okay with
the inclusion in 19.03. While so far only @samueldr has accepted the
change, we can still move the changelog entry to the NixOS 19.09 release
notes in case @lheckemann rejects it.
Signed-off-by: aszlig <aszlig@nix.build>
So far we had MountFlags = "private", but as @Infinisil has correctly
noticed, there is a dedicated PrivateMounts option, which does exactly
that and is better integrated than providing raw mount flags.
When checking for the reason why I used MountFlags instead of
PrivateMounts, I found that at the time I wrote the initial version of
this module (Mar 12 06:15:58 2018 +0100) the PrivateMounts option didn't
exist yet and has been added to systemd in Jun 13 08:20:18 2018 +0200.
Signed-off-by: aszlig <aszlig@nix.build>
Noted by @Infinisil on IRC:
infinisil: Question regarding the confinement PR
infinisil: On line 136 you do different things depending on
RootDirectoryStartOnly
infinisil: But on line 157 you have an assertion that disallows that
option being true
infinisil: Is there a reason behind this or am I missing something
I originally left this in so that once systemd supports that, we can
just flip a switch and remove the assertion and thus support
RootDirectoryStartOnly for our confinement module.
However, this doesn't seem to be on the roadmap for systemd in the
foreseeable future, so I'll just remove this, especially because it's
very easy to add it again, once it is supported.
Signed-off-by: aszlig <aszlig@nix.build>
These are all `mkRenamedOptionModule` ones from 2015 (there are none
from 2014). `mkAliasOptionModule` from 2015 were left in because those
don't give any warning at all.
users.ldap.daemon.rootpwmodpw -> users.ldap.daemon.rootpwmodpwFile
users.ldap.bind.password -> users.ldap.bind.passwordFile
as users.ldap.daemon.rootpwmodpw never was part of a release, no
mkRenamedOptionModule is introduced.
Regression introduced by c94005358c.
The commit introduced declarative docker containers and subsequently
enables docker whenever any declarative docker containers are defined.
This is done via an option with type "attrsOf somesubmodule" and a check
on whether the attribute set is empty.
Unfortunately, the check was whether a *list* is empty rather than
wether an attribute set is empty, so "mkIf (cfg != [])" *always*
evaluates to true and thus subsequently enables docker by default:
$ nix-instantiate --eval nixos --arg configuration {} \
-A config.virtualisation.docker.enable
true
Fixing this is simply done by changing the check to "mkIf (cfg != {})".
Tested this by running the "docker-containers" NixOS test and it still
passes.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @benley, @danbst, @Infinisil, @nlewo
As the configuration for the exporters and alertmanager is unchanged
between the two major versions this patch tries to minimize
duplication while at the same time as there's no upgrade path from 1.x
to 2.x, it allows running the two services in parallel. See also #56037
This otherwise does not eval `:tested` any more, which means no nixos
channel updates.
Regression comes from 0eb6d0735f (#57751)
which added an assertion stopping the use of `autoResize` when the
filesystem cannot be resized automatically.
* WIP: Run Docker containers as declarative systemd services
* PR feedback round 1
* docker-containers: add environment, ports, user, workdir options
* docker-containers: log-driver, string->str, line wrapping
* ExecStart instead of script wrapper, %n for container name
* PR feedback: better description and example formatting
* Fix docbook formatting (oops)
* Use a list of strings for ports, expand documentation
* docker-continers: add a simple nixos test
* waitUntilSucceeds to avoid potential weird async issues
* Don't enable docker daemon unless we actually need it
* PR feedback: leave ExecReload undefined
IPv6 container support broke a while ago and we didn't notice it. Making
them part of the (small) release test set should fix that. At this point
in time they should be granted the same amount of importance as the
legacy IP tests.
Prior to this commit an installation over serial via syslinux would
involve:
1. setting bitrate to BIOS's bitrate (typically 115200)
2. setting bitrate to syslinux's bitrate (38400)
3. setting bitrate to stty's bitrate (115200)
By changing syslinux's bitrate to 115200, an installation over serial
is a smoother experience, and consistent with the GRUB2 installation
which is also 115200 bps.
[root@nixos:~]# stty
speed 115200 baud; line = 0;
-brkint ixoff iutf8
-iexten
In a future commit I will add default serial terminals to the syslinux
kernel lines.
Previously this module precluded use of storage backends other than
`filesystem`. It is now possible to configure another storage backend
manually by setting `services.dockerRegistry.storagePath` to `null` and
configuring the other backend via `extraConfig`.
I'm not 100% sure about the incompatibility lines,
but I believe it's better to discourage these anyway.
If you find better information, feel free to amend...
The 32-bit thing is completely GPU-agnostic, so I can't see why we had
it separately for proprietary drivers and missing for the rest.
Just set them normally.
Exporting them will propagate them to all executed programs
such as bash (as used by nix-shell or nix run),
and badness ensues when different formats are used.
Compatibility with other distributions/software and expectation
of users coming from other systems should have higher priority over consistency.
In particular this fixes#51375, where the NetworkManager-wait-online.service
broke as a result of this.
acpilight package and module have been added to nixpkgs, but the
module hasn't been added to module-list.nix, so using it results in
the following error.
```
The option `hardware.acpilight' defined in `/etc/nixos/configuration.nix' does not exist.
```
Add the module to module-list.nix.
In Linux 4.19 there has been a major rework of the overlayfs
implementation and it now opens files in lowerdir with O_NOATIME, which
in turn caused issues in our VM tests because the process owner of QEMU
doesn't match the file owner of the lowerdir.
The crux here is that 9p propagates the O_NOATIME flag to the host and
the guest kernel has no way of verifying whether that flag will lead to
any problems beforehand.
There is ongoing work to possibly fix this in the kernel, but it will
take a while until there is a working patch and consensus.
So in order to bring our default kernel back to 4.19 and of course make
it possible to run newer kernels in VM tests, I'm merging a small QEMU
patch as an interim solution, which we can drop once we have a working
fix in the next round of stable kernels.
Now we already had Linux 4.19 set as the default kernel, but that was
subsequently reverted in 048c36ccaa
because the patch we have used was the revert of the commit I bisected a
while ago.
This patch broke overlayfs in other ways, so I'm also merging in a VM
test by @bachp, which only tests whether overlayfs is working, just to
be on the safe side that something like this won't happen in the future.
Even though this change could be considered a moderate mass-rebuild at
least for GNU/Linux, I'm merging this to master, mainly to give us some
time to get it into the current 19.03 release branch (and subsequent
testing window) once we got no new breaking builds from Hydra.
Cc: @samueldr, @lheckemann
Fixes: https://github.com/NixOS/nixpkgs/issues/54509
Fixes: https://github.com/NixOS/nixpkgs/issues/48828
Merges: https://github.com/NixOS/nixpkgs/pull/57641
Merges: https://github.com/NixOS/nixpkgs/pull/54508
seems that this got broken when the config option was made to use enums. "secure" got replaced with "enum", which isn't a valid option for the failure mode.
After working on the last wireguard bump (#57534), we figured that it's
probably a good idea to have a basic test which confirms that a simple
VPN with wireguard still works.
This test starts two peers with a `wg0` network interface and adds a v4
and a v6 route that goes through `wg0`.
Assert that autoResize is only used when fsType is explicitly set to a
supported filesystem: if it's set to "auto", the default, the required
resizing tools won't be copied into the initrd even if the actual
filesystem is supported.
This is a backwards-incompatible change and while it won't probably
affect a whole lot of users, it makes sense to give them a heads-up
anyway.
Signed-off-by: aszlig <aszlig@nix.build>
Since 34234dcb51, for resize2fs to be automatically included in
initrd, a filesystem needed for boot must be explicitly defined as an
ext* type filesystem.
The default, which is /tmp, has a few issues associated with it:
One being that it makes it easy for users on the system to spoof a
PostgreSQL server if it's not running, causing applications to connect
to their provided sockets instead of just failing to connect.
Another one is that it makes sandboxing of PostgreSQL and other services
unnecessarily difficult. This is already the case if only PrivateTmp is
used in a systemd service, so in order for such a service to be able to
connect to PostgreSQL, a bind mount needs to be done from /tmp to some
other path, so the service can access it. This pretty much defeats the
whole purpose of PrivateTmp.
We regularily run into issues with this in the past already (one example
would be https://github.com/NixOS/nixpkgs/pull/24317) and with the new
systemd-confinement mode upcoming in
https://github.com/NixOS/nixpkgs/pull/57519, it makes it even more
tedious to sandbox services.
I've tested this change against all the postgresql NixOS VM tests and
they still succeed and I also grepped through the source tree to replace
other occasions where we might have /tmp hardcoded. Luckily there were
very few occasions.
Signed-off-by: aszlig <aszlig@nix.build>
Cc: @ocharles, @thoughtpolice, @danbst
My implementation was relying on PrivateDevices, PrivateTmp,
PrivateUsers and others to be false by default if chroot-only mode is
used.
However there is an ongoing effort[1] to change these defaults, which
then will actually increase the attack surface in chroot-only mode,
because it is expected that there is no /dev, /sys or /proc.
If for example PrivateDevices is enabled by default, there suddenly will
be a mounted /dev in the chroot and we wouldn't detect it.
Fortunately, our tests cover that, but I'm preparing for this anyway so
that we have a smoother transition without the need to fix our
implementation again.
Thanks to @Infinisil for the heads-up.
[1]: https://github.com/NixOS/nixpkgs/issues/14645
Signed-off-by: aszlig <aszlig@nix.build>
From @edolstra at [1]:
BTW we probably should take the closure of the whole unit rather than
just the exec commands, to handle things like Environment variables.
With this commit, there is now a "fullUnit" option, which can be enabled
to include the full closure of the service unit into the chroot.
However, I did not enable this by default, because I do disagree here
and *especially* things like environment variables or environment files
shouldn't be in the closure of the chroot.
For example if you have something like:
{ pkgs, ... }:
{
systemd.services.foobar = {
serviceConfig.EnvironmentFile = ${pkgs.writeText "secrets" ''
user=admin
password=abcdefg
'';
};
}
We really do not want the *file* to end up in the chroot, but rather
just the environment variables to be exported.
Another thing is that this makes it less predictable what actually will
end up in the chroot, because we have a "globalEnvironment" option that
will get merged in as well, so users adding stuff to that option will
also make it available in confined units.
I also added a big fat warning about that in the description of the
fullUnit option.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Another thing requested by @edolstra in [1]:
We should not provide a different /bin/sh in the chroot, that's just
asking for confusion and random shell script breakage. It should be
the same shell (i.e. bash) as in a regular environment.
While I personally would even go as far to even have a very restricted
shell that is not even a shell and basically *only* allows "/bin/sh -c"
with only *very* minimal parsing of shell syntax, I do agree that people
expect /bin/sh to be bash (or the one configured by environment.binsh)
on NixOS.
So this should make both others and me happy in that I could just use
confinement.binSh = "${pkgs.dash}/bin/dash" for the services I confine.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Quoting @edolstra from [1]:
I don't really like the name "chroot", something like "confine[ment]"
or "restrict" seems better. Conceptually we're not providing a
completely different filesystem tree but a restricted view of the same
tree.
I already used "confinement" as a sub-option and I do agree that
"chroot" sounds a bit too specific (especially because not *only* chroot
is involved).
So this changes the module name and its option to use "confinement"
instead of "chroot" and also renames the "chroot.confinement" to
"confinement.mode".
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Currently, if you want to properly chroot a systemd service, you could
do it using BindReadOnlyPaths=/nix/store (which is not what I'd call
"properly", because the whole store is still accessible) or use a
separate derivation that gathers the runtime closure of the service you
want to chroot. The former is the easier method and there is also a
method directly offered by systemd, called ProtectSystem, which still
leaves the whole store accessible. The latter however is a bit more
involved, because you need to bind-mount each store path of the runtime
closure of the service you want to chroot.
This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages. That's also what I did several times[1][2] in the
past.
However, this process got a bit tedious, so I decided that it would be
generally useful for NixOS, so this very implementation was born.
Now if you want to chroot a systemd service, all you need to do is:
{
systemd.services.yourservice = {
description = "My Shiny Service";
wantedBy = [ "multi-user.target" ];
chroot.enable = true;
serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
};
}
If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes "script" and {pre,post}Start) need to be in the
chroot, it can be specified using the chroot.packages option. By
default (which uses the "full-apivfs"[3] confinement mode), a user
namespace is set up as well and /proc, /sys and /dev are mounted
appropriately.
In addition - and by default - a /bin/sh executable is provided as well,
which is useful for most programs that use the system() C library call
to execute commands via shell. The shell providing /bin/sh is dash
instead of the default in NixOS (which is bash), because it's way more
lightweight and after all we're chrooting because we want to lower the
attack surface and it should be only used for "/bin/sh -c something".
Prior to submitting this here, I did a first implementation of this
outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality
from systemd-lib.nix, just because it's only a single line.
However, I decided to just re-use the one from systemd here and
subsequently made it available when importing systemd-lib.nix, so that
the systemd-chroot implementation also benefits from fixes to that
functionality (which is now a proper function).
Unfortunately, we do have a few limitations as well. The first being
that DynamicUser doesn't work in conjunction with tmpfs, because it
already sets up a tmpfs in a different path and simply ignores the one
we define. We could probably solve this by detecting it and try to
bind-mount our paths to that different path whenever DynamicUser is
enabled.
The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and not the
individual bind mounts or our tmpfs. It would be helpful if systemd
would have a way to disable specific bind mounts as well or at least
have some way to ignore failures for the bind mounts/tmpfs setup.
Another quirk we do have right now is that systemd tries to create a
/usr directory within the chroot, which subsequently fails. Fortunately,
this is just an ugly error and not a hard failure.
[1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62
[2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124
[3]: The reason this is called "full-apivfs" instead of just "full" is
to make room for a *real* "full" confinement mode, which is more
restrictive even.
[4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix
Signed-off-by: aszlig <aszlig@nix.build>