Allows one or more directories to be mounted as a read-only file system.
This makes it convenient to run volatile containers that do not retain
application state.
This adds the containers.<name>.enableTun option allowing containers to
access /dev/net/tun. This is required by openvpn, tinc, etc. in order to
work properly inside containers.
The new option builds on top of two generic options
containers.<name>.additionalCapabilities and
containers.<name>.allowedDevices which also can be used for example when
adding support for FUSE later down the road.
Get rid of the "or null" stuff. Also change 'cfg . "foo"' to 'cfg.foo'.
Also fixed what appears to be an actual bug: in postStartScript,
cfg.attribute (where attribute is a function argument) should be
cfg.${attribute}.
With these changes, a container can have more then one veth-pair. This allows for example to have LAN and DMZ as bridges on the host and add dedicated containers for proxies, ipv4-firewall and ipv6-firewall. Or to have a bridge for normal WAN, one bridge for administration and one bridge for customer-internal communication. So that web-server containers can be reached from outside per http, from the management via ssh and can talk to their database via the customer network.
The scripts to set up the containers are now rendered several times instead of just one template. The scripts now contain per-container code to configure the extra veth interfaces. The default template without support for extra-veths is still rendered for the imperative containers.
Also a test is there to see if extra veths can be placed into host-bridges or can be reached via routing.
This makes the container a bit more secure, by preventing root
creating device nodes to access the host file system, for
instance. (Reference: systemd-nspawn@.service in systemd.)
This moves nixos-containers into its own package so that it can be
relied upon by other packages/systems. This should make development
using dynamic containers much easier.
Since systemd version 230, it is required to have a machine-id file
prior to the startup of the container. If the file is empty, a transient
machine ID is generated by systemd-nspawn.
See systemd/systemd#3014 for more details on the matter.
This unbreaks all of the containers-* NixOS tests.
Signed-off-by: aszlig <aszlig@redmoonstudios.org>
Cc: @edolstra
Closes: #15808
The existence of $root/var/lib/private/host-notify as a socket
prevented a bind mount:
container foo[8083]: Failed to create mount point /var/lib/containers/foo/var/lib/private/host-notify: No such device or address
Without the templating (which is still present for imperative containers), it
will be possible to set individual dependencies. Like depending on the network
only if the hostbridge or hardware interfaces are used.
Ported from #3021
This allows the containers to have their interface in a bridge on the host.
Also this adds IPv6 addresses to the containers both with bridged and unbridged
network.
If the host is shutting down, machinectl may fail because it's
bus-activated and D-Bus will be shutting down. So just send a signal
to the leader process directly.
Fixes#6212.
Systemd in a container will call sd_notify when it has finished
booting, so we can use that to signal that the container is
ready. This does require some fiddling with $NOTIFY_SOCKET.
Previously "machinectl reboot/poweroff" brutally killed the container,
as did "systemctl stop/restart". And reboot didn't actually work. Now
everything is fine.
Previously "machinectl reboot/poweroff" brutally killed the container,
as did "systemctl stop/restart". And reboot didn't actually work. Now
everything is fine.
By setting a line like
MACVLANS="eno1"
in /etc/containers/<name>.conf, the container will get an Ethernet
interface named mv-eno1, which represents an additional MAC address on
the physical eno1 interface. Thus the container has direct access to
the physical network. You can specify multiple interfaces in MACVLANS.
Unfortunately, you can't do this with wireless interfaces.
Note that dhcpcd is disabled in containers by default, so you'll
probably want to set
networking.useDHCP = true;
in the container, or configure a static IP address.
To do: add a containers.* option for this, and a flag for
"nixos-container create".
Using pkgs.lib on the spine of module evaluation is problematic
because the pkgs argument depends on the result of module
evaluation. To prevent an infinite recursion, pkgs and some of the
modules are evaluated twice, which is inefficient. Using ‘with lib’
prevents this problem.
The command nixos-container can now create containers. For instance,
the following creates and starts a container named ‘database’:
$ nixos-container create database
The configuration of the container is stored in
/var/lib/containers/<name>/etc/nixos/configuration.nix. After editing
the configuration, you can make the changes take effect by doing
$ nixos-container update database
The container can also be destroyed:
$ nixos-container destroy database
Containers are now executed using a template unit,
‘container@.service’, so the unit in this example would be
‘container@database.service’.
For example, the following sets up a container named ‘foo’. The
container will have a single network interface eth0, with IP address
10.231.136.2. The host will have an interface c-foo with IP address
10.231.136.1.
systemd.containers.foo =
{ privateNetwork = true;
hostAddress = "10.231.136.1";
localAddress = "10.231.136.2";
config =
{ services.openssh.enable = true; };
};
With ‘privateNetwork = true’, the container has the CAP_NET_ADMIN
capability, allowing it to do arbitrary network configuration, such as
setting up firewall rules. This is secure because it cannot touch the
interfaces of the host.
The helper program ‘run-in-netns’ is needed at the moment because ‘ip
netns exec’ doesn't quite do the right thing (it remounts /sys without
bind-mounting the original /sys/fs/cgroups).
These are stored on the host in
/nix/var/nix/{profiles,gcroots}/per-container/<container-name> to
ensure that container profiles/roots are not garbage-collected.
On the host, you can run
$ socat unix:<path-to-container>/var/lib/login.socket -,echo=0,raw
to get a login prompt. So this allows logging in even if the
container has no SSH access enabled.
You can also do
$ socat unix:<path-to-container>/var/lib/root-shell.socket -
to get a plain root shell. (This socket is only accessible by root,
obviously.) This makes it easy to execute commands in the container,
e.g.
$ echo reboot | socat unix:<path-to-container>/var/lib/root-shell.socket -
It is parameterized by a function that takes a name and evaluates to the
option type for the attribute of that name. Together with
submoduleWithExtraArgs, this subsumes nixosSubmodule.
You can now say:
systemd.containers.foo.config =
{ services.openssh.enable = true;
services.openssh.ports = [ 2022 ];
users.extraUsers.root.openssh.authorizedKeys.keys = [ "ssh-dss ..." ];
};
which defines a NixOS instance with the given configuration running
inside a lightweight container.
You can also manage the configuration of the container independently
from the host:
systemd.containers.foo.path = "/nix/var/nix/profiles/containers/foo";
where "path" is a NixOS system profile. It can be created/updated by
doing:
$ nix-env --set -p /nix/var/nix/profiles/containers/foo \
-f '<nixos>' -A system -I nixos-config=foo.nix
The container configuration (foo.nix) should define
boot.isContainer = true;
to optimise away the building of a kernel and initrd. This is done
automatically when using the "config" route.
On the host, a lightweight container appears as the service
"container-<name>.service". The container is like a regular NixOS
(virtual) machine, except that it doesn't have its own kernel. It has
its own root file system (by default /var/lib/containers/<name>), but
shares the Nix store of the host (as a read-only bind mount). It also
has access to the network devices of the host.
Currently, if the configuration of the container changes, running
"nixos-rebuild switch" on the host will cause the container to be
rebooted. In the future we may want to send some message to the
container so that it can activate the new container configuration
without rebooting.
Containers are not perfectly isolated yet. In particular, the host's
/sys/fs/cgroup is mounted (writable!) in the guest.