Cross-compilationIntroduction
"Cross-compilation" means compiling a program on one machine for another
type of machine. For example, a typical use of cross-compilation is to
compile programs for embedded devices. These devices often don't have the
computing power and memory to compile their own programs. One might think
that cross-compilation is a fairly niche concern. However, there are
significant advantages to rigorously distinguishing between build-time and
run-time environments! Significant, because the benefits apply even when one
is developing and deploying on the same machine. Nixpkgs is increasingly
adopting the opinion that packages should be written with cross-compilation
in mind, and nixpkgs should evaluate in a similar way (by minimizing
cross-compilation-specific special cases) whether or not one is
cross-compiling.
This chapter will be organized in three parts. First, it will describe the
basics of how to package software in a way that supports cross-compilation.
Second, it will describe how to use Nixpkgs when cross-compiling. Third, it
will describe the internal infrastructure supporting cross-compilation.
Packaging in a cross-friendly mannerPlatform parameters
Nixpkgs follows the
conventions
of GNU autoconf. We distinguish between 3 types of platforms when
building a derivation: build,
host, and target. In
summary, build is the platform on which a package
is being built, host is the platform on which it
will run. The third attribute, target, is relevant
only for certain specific compilers and build tools.
In Nixpkgs, these three platforms are defined as attribute sets under the
names buildPlatform, hostPlatform,
and targetPlatform. They are always defined as
attributes in the standard environment. That means one can access them
like:
{ stdenv, fooDep, barDep, .. }: ...stdenv.buildPlatform...
.
buildPlatform
The "build platform" is the platform on which a package is built. Once
someone has a built package, or pre-built binary package, the build
platform should not matter and can be ignored.
hostPlatform
The "host platform" is the platform on which a package will be run. This
is the simplest platform to understand, but also the one with the worst
name.
targetPlatform
The "target platform" attribute is, unlike the other two attributes, not
actually fundamental to the process of building software. Instead, it is
only relevant for compatibility with building certain specific compilers
and build tools. It can be safely ignored for all other packages.
The build process of certain compilers is written in such a way that the
compiler resulting from a single build can itself only produce binaries
for a single platform. The task of specifying this single "target
platform" is thus pushed to build time of the compiler. The root cause
of this is that the compiler (which will be run on the host) and the
standard library/runtime (which will be run on the target) are built by
a single build process.
There is no fundamental need to think about a single target ahead of
time like this. If the tool supports modular or pluggable backends, both
the need to specify the target at build time and the constraint of
having only a single target disappear. An example of such a tool is
LLVM.
Although the existence of a "target platfom" is arguably a historical
mistake, it is a common one: examples of tools that suffer from it are
GCC, Binutils, GHC and Autoconf. Nixpkgs tries to avoid sharing in the
mistake where possible. Still, because the concept of a target platform
is so ingrained, it is best to support it as is.
The exact schema these fields follow is a bit ill-defined due to a long and
convoluted evolution, but this is slowly being cleaned up. You can see
examples of ones used in practice in
lib.systems.examples; note how they are not all very
consistent. For now, here are few fields can count on them containing:
system
This is a two-component shorthand for the platform. Examples of this
would be "x86_64-darwin" and "i686-linux"; see
lib.systems.doubles for more. The first component
corresponds to the CPU architecture of the platform and the second to
the operating system of the platform ([cpu]-[os]).
This format has built-in support in Nix, such as the
builtins.currentSystem impure string.
config
This is a 3- or 4- component shorthand for the platform. Examples of
this would be x86_64-unknown-linux-gnu and
aarch64-apple-darwin14. This is a standard format
called the "LLVM target triple", as they are pioneered by LLVM. In the
4-part form, this corresponds to
[cpu]-[vendor]-[os]-[abi]. This format is strictly
more informative than the "Nix host double", as the previous format
could analogously be termed. This needs a better name than
config!
parsed
This is a Nix representation of a parsed LLVM target triple with
white-listed components. This can be specified directly, or actually
parsed from the config. See
lib.systems.parse for the exact representation.
libc
This is a string identifying the standard C library used. Valid
identifiers include "glibc" for GNU libc, "libSystem" for Darwin's
Libsystem, and "uclibc" for µClibc. It should probably be refactored to
use the module system, like parse.
is*
These predicates are defined in lib.systems.inspect,
and slapped onto every platform. They are superior to the ones in
stdenv as they force the user to be explicit about
which platform they are inspecting. Please use these instead of those.
platform
This is, quite frankly, a dumping ground of ad-hoc settings (it's an
attribute set). See lib.systems.platforms for
examples—there's hopefully one in there that will work verbatim for
each platform that is working. Please help us triage these flags and
give them better homes!
Theory of dependency categorization
This is a rather philosophical description that isn't very
Nixpkgs-specific. For an overview of all the relevant attributes given to
mkDerivation, see
. For a description of how
everything is implemented, see
.
In this section we explore the relationship between both runtime and
build-time dependencies and the 3 Autoconf platforms.
A run time dependency between two packages requires that their host
platforms match. This is directly implied by the meaning of "host platform"
and "runtime dependency": The package dependency exists while both packages
are running on a single host platform.
A build time dependency, however, has a shift in platforms between the
depending package and the depended-on package. "build time dependency"
means that to build the depending package we need to be able to run the
depended-on's package. The depending package's build platform is therefore
equal to the depended-on package's host platform.
If both the dependency and depending packages aren't compilers or other
machine-code-producing tools, we're done. And indeed
buildInputs and nativeBuildInputs
have covered these simpler build-time and run-time (respectively) changes
for many years. But if the dependency does produce machine code, we might
need to worry about its target platform too. In principle, that target
platform might be any of the depending package's build, host, or target
platforms, but we prohibit dependencies from a "later" platform to an
earlier platform to limit confusion because we've never seen a legitimate
use for them.
Finally, if the depending package is a compiler or other
machine-code-producing tool, it might need dependencies that run at "emit
time". This is for compilers that (regrettably) insist on being built
together with their source langauges' standard libraries. Assuming build !=
host != target, a run-time dependency of the standard library cannot be run
at the compiler's build time or run time, but only at the run time of code
emitted by the compiler.
Putting this all together, that means we have dependencies in the form
"host → target", in at most the following six combinations:
Possible dependency types
Dependency's host platform
Dependency's target platform
build
build
build
host
build
target
host
host
host
target
target
target
Some examples will make this table clearer. Suppose there's some package
that is being built with a (build, host, target)
platform triple of (foo, bar, baz). If it has a
build-time library dependency, that would be a "host → build" dependency
with a triple of (foo, foo, *) (the target platform is
irrelevant). If it needs a compiler to be built, that would be a "build →
host" dependency with a triple of (foo, foo, *) (the
target platform is irrelevant). That compiler, would be built with another
compiler, also "build → host" dependency, with a triple of (foo,
foo, foo).
Cross packaging cookbook
Some frequently encountered problems when packaging for cross-compilation
should be answered here. Ideally, the information above is exhaustive, so
this section cannot provide any new information, but it is ludicrous and
cruel to expect everyone to spend effort working through the interaction of
many features just to figure out the same answer to the same common
problem. Feel free to add to this list!
What if my package's build system needs to build a C program to be run
under the build environment?
depsBuildBuild = [ buildPackages.stdenv.cc ];
Add it to your mkDerivation invocation.
My package fails to find ar.
Many packages assume that an unprefixed ar is
available, but Nix doesn't provide one. It only provides a prefixed one,
just as it only does for all the other binutils programs. It may be
necessary to patch the package to fix the build system to use a prefixed
`ar`.
My package's testsuite needs to run host platform code.
doCheck = stdenv.hostPlatform != stdenv.buildPlatfrom;
Add it to your mkDerivation invocation.
Cross-building packages
Nixpkgs can be instantiated with localSystem alone, in
which case there is no cross-compiling and everything is built by and for
that system, or also with crossSystem, in which case
packages run on the latter, but all building happens on the former. Both
parameters take the same schema as the 3 (build, host, and target) platforms
defined in the previous section. As mentioned above,
lib.systems.examples has some platforms which are used as
arguments for these parameters in practice. You can use them
programmatically, or on the command line:
nix-build <nixpkgs> --arg crossSystem '(import <nixpkgs/lib>).systems.examples.fooBarBaz' -A whatever
Eventually we would like to make these platform examples an unnecessary
convenience so that
nix-build <nixpkgs> --arg crossSystem '{ config = "<arch>-<os>-<vendor>-<abi>"; }' -A whatever
works in the vast majority of cases. The problem today is dependencies on
other sorts of configuration which aren't given proper defaults. We rely on
the examples to crudely to set those configuration parameters in some
vaguely sane manner on the users behalf. Issue
#34274
tracks this inconvenience along with its root cause in crufty configuration
options.
While one is free to pass both parameters in full, there's a lot of logic to
fill in missing fields. As discussed in the previous section, only one of
system, config, and
parsed is needed to infer the other two. Additionally,
libc will be inferred from parse.
Finally, localSystem.system is also
impurely inferred based on the platform evaluation
occurs. This means it is often not necessary to pass
localSystem at all, as in the command-line example in the
previous paragraph.
Many sources (manual, wiki, etc) probably mention passing
system, platform, along with the
optional crossSystem to nixpkgs: import
<nixpkgs> { system = ..; platform = ..; crossSystem = ..;
}. Passing those two instead of localSystem is
still supported for compatibility, but is discouraged. Indeed, much of the
inference we do for these parameters is motivated by compatibility as much
as convenience.
One would think that localSystem and
crossSystem overlap horribly with the three
*Platforms (buildPlatform,
hostPlatform, and targetPlatform; see
stage.nix or the manual). Actually, those identifiers are
purposefully not used here to draw a subtle but important distinction: While
the granularity of having 3 platforms is necessary to properly *build*
packages, it is overkill for specifying the user's *intent* when making a
build plan or package set. A simple "build vs deploy" dichotomy is adequate:
the sliding window principle described in the previous section shows how to
interpolate between the these two "end points" to get the 3 platform triple
for each bootstrapping stage. That means for any package a given package
set, even those not bound on the top level but only reachable via
dependencies or buildPackages, the three platforms will
be defined as one of localSystem or
crossSystem, with the former replacing the latter as one
traverses build-time dependencies. A last simple difference is that
crossSystem should be null when one doesn't want to
cross-compile, while the *Platforms are always non-null.
localSystem is always non-null.
Cross-compilation infrastructureImplementation of dependencies
The categorizes of dependencies developed in
are specified as
lists of derivations given to mkDerivation, as
documented in . In short,
each list of dependencies for "host → target" of "foo → bar" is called
depsFooBar, with exceptions for backwards
compatibility that depsBuildHost is instead called
nativeBuildInputs and depsHostTarget
is instead called buildInputs. Nixpkgs is now structured
so that each depsFooBar is automatically taken from
pkgsFooBar. (These pkgsFooBars are
quite new, so there is no special case for
nativeBuildInputs and buildInputs.)
For example, pkgsBuildHost.gcc should be used at
build-time, while pkgsHostTarget.gcc should be used at
run-time.
Now, for most of Nixpkgs's history, there were no
pkgsFooBar attributes, and most packages have not been
refactored to use it explicitly. Prior to those, there were just
buildPackages, pkgs, and
targetPackages. Those are now redefined as aliases to
pkgsBuildHost, pkgsHostTarget, and
pkgsTargetTarget. It is acceptable, even
recommended, to use them for libraries to show that the host platform is
irrelevant.
But before that, there was just pkgs, even though both
buildInputs and nativeBuildInputs
existed. [Cross barely worked, and those were implemented with some hacks
on mkDerivation to override dependencies.] What this
means is the vast majority of packages do not use any explicit package set
to populate their dependencies, just using whatever
callPackage gives them even if they do correctly sort
their dependencies into the multiple lists described above. And indeed,
asking that users both sort their dependencies, and
take them from the right attribute set, is both too onerous and redundant,
so the recommended approach (for now) is to continue just categorizing by
list and not using an explicit package set.
To make this work, we "splice" together the six
pkgsFooBar package sets and have
callPackage actually take its arguments from that. This
is currently implemented in pkgs/top-level/splice.nix.
mkDerivation then, for each dependency attribute, pulls
the right derivation out from the splice. This splicing can be skipped when
not cross-compiling as the package sets are the same, but still is a bit
slow for cross-compiling. We'd like to do something better, but haven't
come up with anything yet.
Bootstrapping
Each of the package sets described above come from a single bootstrapping
stage. While pkgs/top-level/default.nix, coordinates
the composition of stages at a high level,
pkgs/top-level/stage.nix "ties the knot" (creates the
fixed point) of each stage. The package sets are defined per-stage however,
so they can be thought of as edges between stages (the nodes) in a graph.
Compositions like pkgsBuildTarget.targetPackages can be
thought of as paths to this graph.
While there are many package sets, and thus many edges, the stages can also
be arranged in a linear chain. In other words, many of the edges are
redundant as far as connectivity is concerned. This hinges on the type of
bootstrapping we do. Currently for cross it is:
(native, native, native)(native, native, foreign)(native, foreign, foreign)
In each stage, pkgsBuildHost refers the the previous
stage, pkgsBuildBuild refers to the one before that, and
pkgsHostTarget refers to the current one, and
pkgsTargetTarget refers to the next one. When there is
no previous or next stage, they instead refer to the current stage. Note
how all the invariants regarding the mapping between dependency and depending
packages' build host and target platforms are preserved.
pkgsBuildTarget and pkgsHostHost are
more complex in that the stage fitting the requirements isn't always a
fixed chain of "prevs" and "nexts" away (modulo the "saturating"
self-references at the ends). We just special case each instead. All the primary
edges are implemented is in pkgs/stdenv/booter.nix,
and secondarily aliases in pkgs/top-level/stage.nix.
Note the native stages are bootstrapped in legacy ways that predate the
current cross implementation. This is why the the bootstrapping stages
leading up to the final stages are ignored inthe previous paragraph.
If one looks at the 3 platform triples, one can see that they overlap such
that one could put them together into a chain like:
(native, native, native, foreign, foreign)
If one imagines the saturating self references at the end being replaced
with infinite stages, and then overlays those platform triples, one ends up
with the infinite tuple:
(native..., native, native, native, foreign, foreign, foreign...)
On can then imagine any sequence of platforms such that there are bootstrap
stages with their 3 platforms determined by "sliding a window" that is the
3 tuple through the sequence. This was the original model for
bootstrapping. Without a target platform (assume a better world where all
compilers are multi-target and all standard libraries are built in their
own derivation), this is sufficient. Conversely if one wishes to cross
compile "faster", with a "Canadian Cross" bootstraping stage where
build != host != target, more bootstrapping stages are
needed since no sliding window providess the pesky
pkgsBuildTarget package set since it skips the Canadian
cross stage's "host".
It is much better to refer to buildPackages than
targetPackages, or more broadly package sets that do
not mention "target". There are three reasons for this.
First, it is because bootstrapping stages do not have a unique
targetPackages. For example a (x86-linux,
x86-linux, arm-linux) and (x86-linux, x86-linux,
x86-windows) package set both have a (x86-linux,
x86-linux, x86-linux) package set. Because there is no canonical
targetPackages for such a native (build ==
host == target) package set, we set their
targetPackages
Second, it is because this is a frequent source of hard-to-follow
"infinite recursions" / cycles. When only package sets that don't mention
target are used, the package set forms a directed acyclic graph. This
means that all cycles that exist are confined to one stage. This means
they are a lot smaller, and easier to follow in the code or a backtrace. It
also means they are present in native and cross builds alike, and so more
likely to be caught by CI and other users.
Thirdly, it is because everything target-mentioning only exists to
accommodate compilers with lousy build systems that insist on the compiler
itself and standard library being built together. Of course that is bad
because bigger derivations means longer rebuilds. It is also problematic because
it tends to make the standard libraries less like other libraries than
they could be, complicating code and build systems alike. Because of the
other problems, and because of these innate disadvantages, compilers ought
to be packaged another way where possible.
If one explores Nixpkgs, they will see derivations with names like
gccCross. Such *Cross derivations is
a holdover from before we properly distinguished between the host and
target platforms—the derivation with "Cross" in the name covered the
build = host != target case, while the other covered
the host = target, with build platform the same or not
based on whether one was using its .nativeDrv or
.crossDrv. This ugliness will disappear soon.