Create a many-layered Docker Image.
Implements much less than buildImage:
- Doesn't support specific uids/gids
- Doesn't support runninng commands after building
- Doesn't require qemu
- Doesn't create mutable copies of the files in the path
- Doesn't support parent images
If you want those feature, I recommend using buildLayeredImage as an
input to buildImage.
Notably, it does support:
- Caching low level, common paths based on a graph traversial
algorithm, see referencesByPopularity in
0a80233487993256e811f566b1c80a40394c03d6
- Configurable number of layers. If you're not using AUFS or not
extending the image, you can specify a larger number of layers at
build time:
pkgs.dockerTools.buildLayeredImage {
name = "hello";
maxLayers = 128;
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
- Parallelized creation of the layers, improving build speed.
- The contents of the image includes the closure of the configuration,
so you don't have to specify paths in contents and config.
With buildImage, paths referred to by the config were not included
automatically in the image. Thus, if you wanted to call Git, you
had to specify it twice:
pkgs.dockerTools.buildImage {
name = "hello";
contents = [ pkgs.gitFull ];
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
buildLayeredImage on the other hand includes the runtime closure of
the config when calculating the contents of the image:
pkgs.dockerTools.buildImage {
name = "hello";
config.Cmd = [ "${pkgs.gitFull}/bin/git" ];
};
Minor Problems
- If any of the store paths change, every layer will be rebuilt in
the nix-build. However, beacuse the layers are bit-for-bit
reproducable, when these images are loaded in to Docker they will
match existing layers and not be imported or uploaded twice.
Common Questions
- Aren't Docker layers ordered?
No. People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.
- What happens if I have more than maxLayers of store paths?
The first (maxLayers-2) most "popular" paths will have their own
individual layers, then layer #(maxLayers-1) will contain all the
remaining "unpopular" paths, and finally layer #(maxLayers) will
contain the Image configuration.
Using a simple algorithm, convert the references to a path in to a
sorted list of dependent paths based on how often they're referenced
and how deep in the tree they live. Equally-"popular" paths are then
sorted by name.
The existing writeReferencesToFile prints the paths in a simple
ascii-based sorting of the paths.
Sorting the paths by graph improves the chances that the difference
between two builds appear near the end of the list, instead of near
the beginning. This makes a difference for Nix builds which export a
closure for another program to consume, if that program implements its
own level of binary diffing.
For an example, Docker Images. If each store path is a separate layer
then Docker Images can be very efficiently transfered between systems,
and we get very good cache reuse between images built with the same
version of Nixpkgs. However, since Docker only reliably supports a
small number of layers (42) it is important to pick the individual
layers carefully. By storing very popular store paths in the first 40
layers, we improve the chances that the next Docker image will share
many of those layers.*
Given the dependency tree:
A - B - C - D -\
\ \ \ \
\ \ \ \
\ \ - E ---- F
\- G
Nodes which have multiple references are duplicated:
A - B - C - D - F
\ \ \
\ \ \- E - F
\ \
\ \- E - F
\
\- G
Each leaf node is now replaced by a counter defaulted to 1:
A - B - C - D - (F:1)
\ \ \
\ \ \- E - (F:1)
\ \
\ \- E - (F:1)
\
\- (G:1)
Then each leaf counter is merged with its parent node, replacing the
parent node with a counter of 1, and each existing counter being
incremented by 1. That is to say `- D - (F:1)` becomes `- (D:1, F:2)`:
A - B - C - (D:1, F:2)
\ \ \
\ \ \- (E:1, F:2)
\ \
\ \- (E:1, F:2)
\
\- (G:1)
Then each leaf counter is merged with its parent node again, merging
any counters, then incrementing each:
A - B - (C:1, D:2, E:2, F:5)
\ \
\ \- (E:1, F:2)
\
\- (G:1)
And again:
A - (B:1, C:2, D:3, E:4, F:8)
\
\- (G:1)
And again:
(A:1, B:2, C:3, D:4, E:5, F:9, G:2)
and then paths have the following "popularity":
A 1
B 2
C 3
D 4
E 5
F 9
G 2
and the popularity contest would result in the paths being printed as:
F
E
D
C
B
G
A
* Note: People who have used a Dockerfile before assume Docker's
Layers are inherently ordered. However, this is not true -- Docker
layers are content-addressable and are not explicitly layered until
they are composed in to an Image.
This reverts commit f777d2b719.
cc #34409
This breaks evaluation of the tested job:
attribute 'diskInterface' missing, at /nix/store/5k9kk52bv6zsvsyyvpxhm8xmwyn2yjvx-source/pkgs/build-support/vm/default.nix:316:24
This includes the initialy commit was done by @Mic92 plus a few fixes
from my side. So essentially this avoids patching statically linked
executables and also speeds up searching for ELF files altogether.
I've tested this by comparing the outputs of all the derivations which
make use of this hook using the following Nix expression:
let
getPackagesForRev = rev: with import (builtins.fetchGit {
url = ./.;
inherit rev;
}) { config.allowUnfree = true; }; [
cups-kyodialog3 elasticsearch franz gurobi javacard-devkit
masterpdfeditor maxx oracle-instantclient powershell reaper
teamviewer unixODBCDrivers.msodbcsql17 virtlyst wavebox zoom-us
];
pkgs = import <nixpkgs> {};
baseRev = "ef764eb0d8314b81a012dae04642b4766199956d";
in pkgs.runCommand "diff-contents" {
chset = pkgs.lib.zipListsWith (old: new: pkgs.runCommand "diff" {
inherit old new;
nativeBuildInputs = [ pkgs.nukeReferences ];
} ''
mkdir -p "''${NIX_STORE#/}"
cp --no-preserve=all -r "$old" "''${NIX_STORE#/}"
cp --no-preserve=all -r "$new" "''${NIX_STORE#/}"
find "''${old#/}" "''${new#/}" \
\( -type f -exec nuke-refs {} + \) -o \( -type l -delete \)
mkdir "$out"
echo "$old" > "$out/old-path"
echo "$new" > "$out/new-path"
diff -Nur "''${old#/}" "''${new#/}" > "$out/diff" || :
'') (getPackagesForRev baseRev) (getPackagesForRev "");
} ''
err=0
for c in $chset; do
if [ -s "$c/diff" ]; then
echo "$(< "$c/old-path") -> $(< "$c/new-path")" \
"differs, report: $c/diff" >&2
err=1
fi
done
[ $err -eq 0 ] && touch "$out"
''
With these changes there is only one derivation which has altered
contents, which is "franz". However the reason why it has differing
contents is not directly because of the autoPatchelfHook changes, but
because the "env-vars" file from the builder is in
"$out/opt/franz/env-vars" (Cc: @gnidorah) and we now have different
contents for NIX_CFLAGS_COMPILE and other environment variables.
I also tested this against a random static binary and the hook no longer
tries to patch it.
Merges: #47222
The "maxx" package recursively runs isExecutable on a bunch of files and
since the change to use "readelf" instead of "file" a lot of errors like
this one are printed during build:
readelf: Error: Not an ELF file - it has the wrong magic bytes at the
start
While the isExecutable was never meant to be used outside of the
autoPatchelfHook, it's still a good idea to silence the errors because
whenever readelf fails, it clearly indicates that the file in question
is not a valid ELF file.
Signed-off-by: aszlig <aszlig@nix.build>
If the ELF file is not an executable, we do not get a PT_INTERP section,
because after all, it's a *shared* library.
So instead of checking for PT_INTERP (to avoid statically linked
executables) for all ELF files, we add another check to see if it's an
executable and *only* skip it when it is and there's no PT_INTERP.
Signed-off-by: aszlig <aszlig@nix.build>
The `overrideScope` bound by `makeScope` (via special `callPackage`)
took an override in the form `super: self { … }`. But this is
dangerously close to the `self: super { … }` form used by *everything*
else, even other definitions of `overrideScope`! Since that
implementation did not even share any code either until I changed it
recently in 3cf43547f4, this inconsistency
is almost certainly an oversight and not intentional.
Unfortunately, just as the inconstency is hard to debug if one just
assumes the conventional order, any sudden fix would break existing
overrides in the same hard-to-debug way. So instead of changing the
definition a new `overrideScope'` with the conventional order is added,
and old `overrideScope` deprecated with a warning saying to use
`overrideScope'` instead. That will hopefully get people to stop using
`overrideScope`, freeing our hand to change or remove it in the future.
Because dates are an impurity, by default buildImage will use a static
date of one second past the UNIX Epoch. This can be a bit frustrating
when listing docker images in the CLI:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
hello latest 08c791c7846e 48 years ago 25.2MB
If you want to trade the purity for a better user experience, you can
set created to now.
pkgs.dockerTools.buildImage {
name = "hello";
tag = "latest";
created = "now";
contents = pkgs.hello;
config.Cmd = [ "/bin/hello" ];
}
and now the Docker CLI will display a reasonable date and sort the
images as expected:
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
hello latest de2bf4786de6 About a minute ago 25.2MB
This commit adds test based on real-world crates (brotli).
There were a few more edge cases that were missing beforehand. Also it
turned out that we can get rid of the `finalBins` list since that will
now be handled during runtime.
The build expression got quiet large over time and to make it a bit
easier to grasp the different scripts involved in the build are now
separated from the nix file.
Cargo has a few odd (old) ways of picking source files if the `bin.path`
attribute isn't given in the Cargo.toml. This commit adds support for
some of those. The previous behaviour always defaulted to `src/main.rs`
which was not always the right choice.
Since there is look-ahead into the unpacked sources before running the
actual builder the path selection logic has to be embedded within the
build script.
`buildRustCrate` currently supports two ways of running building
binaries when processing a crate:
- Explicit definition of all the binaries (& optionally the paths to
their respective `main.rs`) and,
- if not binary was explictly configured all files matching the patterns
`src/main.rs`, `src/bin/*.rs`.
When the explicit list is given without path information paths are now
being picked from a list of candidates. The first match wins. The order
is the same as within the cargo compatibility code.
If the crate does not provide any libraries the path `src/{bin_name}.rs`
is also considered.
All underscores within the binary names are translated into dashes (`-`)
before the lookups are made. This seems to be a common convention.
Previously the Release.xz URL would show up with a new hash whenever
debian releases an update. By using archive.org we should have a stable
source for those. I wasn't able to find the equivalent in the debian
world. Maybe they don't keep all the different Release files around..
Introduce a `skawarePackages.buildPackage` function that contains the
common setup, removing a lot of duplication.
In particular, we require that the build directory has to be empty
after the `fixupPhase`, to make sure every relevant file is moved to
the outputs.
A next step would be to deduplicate the `configureFlags` attributes
and only require a `skawareInputs` field.
There's no reason `linkFarm` can't be used for symlinks in
subdirectories, except that currently it doesn't ensure the directory
of the link exists. This backwards-compatible change expands the utility
of the function.