Calculating the tarsum after creating a layer is inefficient, since
we have to read the tarball we've just written from the disk.
This commit simultaneously calculates the tarsum while creating the
tarball.
Appending to an existing tar archive repeatedly seems to be a quadratic
operation, since tar seems to traverse the existing archive even using
the `-r, --append` flag. This commit avoids that by passing the list of
files to a single tar invocation.
Fixes#78744
My previous change broke when there are more packages than the maximum
number of layers. I had assumed that the `store-path-to-layer.sh` was
only ever passed a single store path, but that is not the case if
there are multiple packages going into the final layer. To fix this, we
loop through the paths going into the final layer, appending them to the
tar file and making sure they end up at the right path.
when tar'ing store paths into layered archives when building layered
images, don't use the absolute nix store path so that tar won't complain
if something new is added to the nix store
when building the final docker image, ignore any file changes tar
detects in the layers. they are all immutable and the only thing that
might change is the number of hard links due to store optimization
PR #58431 added /nix/store to each layer.tar. However, the timestamp was
not explicitly set while adding /nix and /nix/store to the archive. This
resulted in different SHA256 hashes of layer.tar between image builds.
This change sets time and owner when tar'ing /nix/store.
Create a many-layered Docker Image.
Implements much less than buildImage:
- Doesn't support specific uids/gids
- Doesn't support runninng commands after building
- Doesn't require qemu
- Doesn't create mutable copies of the files in the path
- Doesn't support parent images
If you want those feature, I recommend using buildLayeredImage as an
input to buildImage.
Notably, it does support:
- Caching low level, common paths based on a graph traversial
algorithm, see referencesByPopularity in
0a80233487993256e811f566b1c80a40394c03d6
- Configurable number of layers. If you're not using AUFS or not
extending the image, you can specify a larger number of layers at
build time:
pkgs.dockerTools.buildLayeredImage {
name = "hello";
maxLayers = 128;
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
- Parallelized creation of the layers, improving build speed.
- The contents of the image includes the closure of the configuration,
so you don't have to specify paths in contents and config.
With buildImage, paths referred to by the config were not included
automatically in the image. Thus, if you wanted to call Git, you
had to specify it twice:
pkgs.dockerTools.buildImage {
name = "hello";
contents = [ pkgs.gitFull ];
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
buildLayeredImage on the other hand includes the runtime closure of
the config when calculating the contents of the image:
pkgs.dockerTools.buildImage {
name = "hello";
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
Minor Problems
- If any of the store paths change, every layer will be rebuilt in
the nix-build. However, beacuse the layers are bit-for-bit
reproducable, when these images are loaded in to Docker they will
match existing layers and not be imported or uploaded twice.
Common Questions
- Aren't Docker layers ordered?
No. People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.
- What happens if I have more than maxLayers of store paths?
The first (maxLayers-2) most "popular" paths will have their own
individual layers, then layer #(maxLayers-1) will contain all the
remaining "unpopular" paths, and finally layer #(maxLayers) will
contain the Image configuration.