dramforever

a row of my life

Bootstrapping Nix

2019-01-24

Bootstrapping Nix for just fun and no profit.

Code implementing this blog post can be found at https://github.com/dramforever/bootstrap-nix, and the Nixpkgs fork used at https://github.com/dramforever/nixpkgs/tree/dram-boot.

Nix is a unique package manager that I probably shouldn't try to explain here. Let's just say it's a Gentoo Prefix but with every single package in its own prefix. You might want to check out the slides of a talk I did a while back.

But unlike a Gentoo Prefix, where the user builds the world from source and thus can generally be put anywhere on the filesystem, Nix users usually download pre-built binaries from the official 'binary cache'. As the official build farm builds packages in /nix/store, that's also where almost every Nix installation lives in and works on.

Even though /nix/store is the default configuration, it doesn't have to be the only possible one. By changing a handful of ./configure options, Nix can be configured to store packages and other data in directories other than /nix. This can be useful for situations where it's impossible to use the directory /nix because root access isn't available.

Gentoo Prefix sure is commonly used in such situations. Does a 'Nix Prefix' actually work?

The plan

We will build a Nix installation in /tmp/nix, since it's likely that /tmp is writable by anyone on a system. Specifically, our new Nix will:

(Edit: I have since learned that the NIX_STORE variable can override the pre-configured settings within Nix. In other words, Nix does not require rebuilding for a 'cross-compiling' scenario like this. We can save one stage of Nix.)

The nix derivation in Nixpkgs already provides these three options as configuration arguments, so let's just override them to our needs:

with import ./nixpkgs;

rec {
  nixBoot = nix.override {
    storeDir = "/tmp/nix/store";
    stateDir = "/tmp/nix/var";
    confDir = "/tmp/nix/etc";
  };

  # ...
}

This means that we are going to build Nix with Nix, so an already-working Nix is required. Moreover, since the 'normal' Nix builds packages to /nix/store, we're going to have to have three different Nix flavors:

This isn't quite as 'good' as Gentoo Prefix, since it requires building Nix on another system beforehand, whereas Gentoo Prefix can do all the work on the target system. However, since Nix stage 2 lives in and works on /tmp/nix/store, which a non-root user is likely to be able to install Nix to, it's conceivable that an installer script could download a pre-built Nix stage 2 and build stage 3 and 4 on a user-selected directory.

Installer

Nix stage 2 is 'pure', in the sense that just by copying it and all dependencies, transitive ones included, to /tmp/nix/store on another machine is sufficient to get a running Nix. Indeed, if we were making a Docker container image in which package management is not required, this would be the method of deployment.

However, as hinted above, Nix also stores some 'other data' in a 'state directory'. An important one is the Nix database, which, among other things, stores metadata and reference information of so called valid store paths. This is akin to a database of installed packages kept by other package managers to keep track of the files installed in a system.

The closureInfo function from Nixpkgs shall help us with both tasks. From pkgs/build-support/closure-info.nix

This derivation builds two files containing information about the closure of 'rootPaths': $out/store-paths contains the paths in the closure, and $out/registration contains a file suitable for use with nix-store --load-db and nix-store --register-validity --hash-given.

Moreover, release.nix from the source code of Nix contains an example of building a binary tarball. With those it's not hard to come up with a tarball containing store paths, registration information for nix-store --load-db and a (crude) installer script that copies the store paths to the real destination, /tmp/nix and initializes the Nix database.

The tarball can be found at https://github.com/dramforever/bootstrap-nix/releases/download/snapshot-20190124.4/nix-installer.tar.xz.

Extra: Packaging with stdenv

Before the excessively time-consuming build process of Nix stage 2, a huge list of derivations to build is presented. Some of these like GCC or glibc take a quite while to build. These toolchain dependencies are useful for building most packages in our new bootstrapped Nix installation, which cannot take advantage from a pre-existing binary cache.

Since stdenv contains the 'default' set of build tools, one might expect that adding stdenv to the closure above would make it possible to avoid the large builds of the toolchains. But quite surprisingly, if you install just Nix and stdenv and try to build install basically anything else, some toolchain derivations still need to be built, indicating that at least some toolchain that is a ubiquitous dependency is not actually in the closure of stdenv.

Several derivations with names like bootstrap-stage4-stdenv-linux stand out. As their name hints, they constitute the progess of bootstrapping the build environment for other 'normal' derivations in Nixpkgs, and are themselves some of the most basic derivations. This bootstrapping process unsurprisingly also comes in multiple stages. The files pkgs/stdenv/linux/default.nix and pkgs/stdenv/booter.nix in Nixpkgs document the stdenv bootstrapping process quite throughly, so I will not attempt to reproduce it here. There is, however, one implication that I should mention, and that is some of the derivations you see in the final Nixpkgs might be from an earlier bootstrapping stage and use a different stdenv. The stdenv attribute of a derivation shows which stdenv a derivation was built against. Some examples:

nix-repl> whois.stdenv
«derivation /nix/store/...-stdenv-linux.drv»

nix-repl> bash.stdenv
«derivation /nix/store/...-bootstrap-stage4-stdenv-linux.drv»

nix-repl> gcc.stdenv
«derivation /nix/store/...-bootstrap-stage4-stdenv-linux.drv»

nix-repl> gcc-unwrapped.stdenv
«derivation /nix/store/...-bootstrap-stage3-stdenv-linux.drv»

nix-repl> glibc.stdenv
«derivation /nix/store/...-bootstrap-stage2-stdenv-linux.drv»

If we only have the final stdenv (first one listed) installed, then when you want to build, say, bash, the stage 4 stdenv would be unavailable, and much of the bootstrap builds would have to be repeated. A way to overcome this would be to include every stdenv stage and not just the last.

To find all the stages of stdenv requires some light Nixpkgs internals hacking. pkgs/stdenv/booter.nix adds attributes to the bootstrapping stages to aid debugging. Using the __bootPackages attribute added to every stdenv stage, we can access the packages set used to build it, and thus __bootPackages.stdenv is the previous stage of stdenv. For example:

nix-repl> stdenv
«derivation /nix/store/...-stdenv-linux.drv»

nix-repl> stdenv.__bootPackages.stdenv
«derivation /nix/store/...-bootstrap-stage4-stdenv-linux.drv»

nix-repl> stdenv.__bootPackages.stdenv.__bootPackages.stdenv
«derivation /nix/store/...-bootstrap-stage3-stdenv-linux.drv»

If a certain package set has __raw attribute set to true, such debugging attributes are not added to the stdenv. This also signifies an end to our recursive adventure, since no more __bootPackage can be found.

nix-repl> stage3 = stdenv.__bootPackages.stdenv.__bootPackages.stdenv

nix-repl> stage1 = stage3.__bootPackages.stdenv.__bootPackages.stdenv

nix-repl> stage1.__bootPackages.stdenv
«derivation /nix/store/h40r3ja68g0phsx7xzqphkryqgkmy9jv-bootstrap-stage0-stdenv-linux.drv»

nix-repl> stage1.__bootPackages.stdenv.__bootPackages.__raw
true

This leads to a roughly working way of finding the stages of stdenv:

let
  stdenvStages = curStage:
    [ curStage ]
      ++
        (if ! curStage.__bootPackages.__raw or false
          then stdenvStages curStage.__bootPackages.stdenv
          else []);
in stdenvStages stdenv

Which, if you test, works pretty well in terms of avoiding toolchain rebuilding.

The tarball built with stages of stdenv can be found at https://github.com/dramforever/bootstrap-nix/releases/download/snapshot-20190124.4/nix-stdenv-installer.tar.xz.

Hiccups

These are just random things found when troubleshooting. Documented here in case anyone is interested.

Disabling sandboxing

If the 'normal' Nix is single-user, Nix stage 1 might fail to find Nix build users and refuse to build anything. Write this file to /tmp/nix/etc/nix/nix.conf:

build-users-group =
sandbox = false

In fact, it's probably a sane default, as we do intend to make /tmp/nix portable.

Other random problems

Conclusion

The resulting Nix tarball is certainly usable. With a clone of Nixpkgs, it is passable as a source-based package manager and distribution. However, as I said at the very start, this bootstrapping is only for fun and no profit, as I have not found a use case with such a non-root package management situation. More testing should be done if anyone is interested in using it.

When using Nix in a source-based manner, a common cause of build failure is link rot. A link to a source tarball or a patch may be dead due to whatever reason on the side of link targets. Two examples can be found in the 'Hiccups' section above. These might not pose a huge problem to the casual binary user, but affects builds a bit more than expected.

(Edit: To combat this problem, Gentoo mirrors these downloaded 'distfiles' for users to use. On NixOS Discourse edolstra mentioned that there is a tarball mirror for Nixkpgs as well, at https://tarballs.nixos.org, which is accessed using the SHA-256 hash specified to fetchurl. I have not checked this in detail, but it seems that Gentoo mirrorring infrastructure has checking of dead links, while the infrastructure for tarballs.nixos.org doesn't.)

The hackability of Nixpkgs served us pretty well in the whole process, and the use of the Nix language played a huge role. Without the light functional programming capabilities of Nix language, utilities from Nixpkgs and the very useful nix repl, it would have been much harder to play around. I feel like the design of Nix language is at a sweet spot for describing and composing derivations, being pretty minimal in syntax and not really getting in the way in the kind of programming needed for a package distribution. The dynamic-typed-ness is also a pragmatic choice, drastically simplifying how the language works

The whole process was an entertaining ride, and we sure had the fun promised!