This way, all builds appear to have a uid/gid of 0 inside the
chroot. In the future, this may allow using programs like
systemd-nspawn inside builds, but that will require assigning a larger
UID/GID map to the build.
Issue #625.
This allows an unprivileged user to perform builds on a diverted store
(i.e. where the physical store location differs from the logical
location).
Example:
$ NIX_LOG_DIR=/tmp/log NIX_REMOTE="local?real=/tmp/store&state=/tmp/var" nix-build -E \
'with import <nixpkgs> {}; runCommand "foo" { buildInputs = [procps nettools]; } "id; ps; ifconfig; echo $out > $out"'
will do a build in the Nix store physically in /tmp/store but
logically in /nix/store (and thus using substituters for the latter).
This is a convenience command to allow users who are not privileged to
create /nix/store to use Nix with regular binary caches. For example,
$ NIX_REMOTE="local?state=$HOME/nix/var&real=/$HOME/nix/store" nix run firefox bashInteractive
will download Firefox and bash from cache.nixos.org, then start a
shell in which $HOME/nix/store is mounted on /nix/store.
This is primarily to subsume the functionality of the
copy-from-other-stores substituter. For example, in the NixOS
installer, we can now do (assuming we're in the target chroot, and the
Nix store of the installation CD is bind-mounted on /tmp/nix):
$ nix-build ... --option substituters 'local?state=/tmp/nix/var&real=/tmp/nix/store'
However, unlike copy-from-other-stores, this also allows write access
to such a store. One application might be fetching substitutes for
/nix/store in a situation where the user doesn't have sufficient
privileges to create /nix, e.g.:
$ NIX_REMOTE="local?state=/home/alice/nix/var&real=/home/alice/nix/store" nix-build ...
The function builtins.fetchgit fetches Git repositories at evaluation
time, similar to builtins.fetchTarball. (Perhaps the name should be
changed, being confusing with respect to Nixpkgs's fetchgit function,
with works at build time.)
Example:
(import (builtins.fetchgit git://github.com/NixOS/nixpkgs) {}).hello
or
(import (builtins.fetchgit {
url = git://github.com/NixOS/nixpkgs-channels;
rev = "nixos-16.03";
}) {}).hello
Note that the result does not contain a .git directory.
If --no-build-output is given (which will become the default for the
"nix" command at least), show the last 10 lines of the build output if
the build fails.
This allows commands like "nix verify --all" or "nix path-info --all"
to work on S3 caches.
Unfortunately, this requires some ugly hackery: when querying the
contents of the bucket, we don't want to have to read every .narinfo
file. But the S3 bucket keys only include the hash part of each store
path, not the name part. So as a special exception
queryAllValidPaths() can now return store paths *without* the name
part, and queryPathInfo() accepts such store paths (returning a
ValidPathInfo object containing the full name).
Caching path info is generally useful. For instance, it speeds up "nix
path-info -rS /run/current-system" (i.e. showing the closure sizes of
all paths in the closure of the current system) from 5.6s to 0.15s.
This also eliminates some APIs like Store::queryDeriver() and
Store::queryReferences().
For convenience, you can now say
$ nix-env -f channel:nixos-16.03 -iA hello
instead of
$ nix-env -f https://nixos.org/channels/nixos-16.03/nixexprs.tar.xz -iA hello
Similarly,
$ nix-shell -I channel:nixpkgs-unstable -p hello
$ nix-build channel:nixos-15.09 -A hello
Abstracting over the NixOS/Nixpkgs channels location also allows us to
use a more efficient transport (e.g. Git) in the future.
This specifies the number of distinct signatures required to consider
each path "trusted".
Also renamed ‘--no-sigs’ to ‘--no-trust’ for the flag that disables
verifying whether a path is trusted (since a path can also be trusted
if it has no signatures, but was built locally).
These are content-addressed paths or outputs of locally performed
builds. They are trusted even if they don't have signatures, so "nix
verify-paths" won't complain about them.
Unlike "nix-store --verify-path", this command verifies signatures in
addition to store path contents, is multi-threaded (especially useful
when verifying binary caches), and has a progress indicator.
Example use:
$ nix verify-paths --store https://cache.nixos.org -r $(type -p thunderbird)
...
[17/132 checked] checking ‘/nix/store/rawakphadqrqxr6zri2rmnxh03gqkrl3-autogen-5.18.6’
Doing a chdir() is a bad idea in multi-threaded programs, leading to
failures such as
error: cannot connect to daemon at ‘/nix/var/nix/daemon-socket/socket’: No such file or directory
Since Linux doesn't have a connectat() syscall like FreeBSD, there is
no way we can support this in a race-free way.
This enables an optimisation in hydra-queue-runner, preventing a
download of a NAR it just uploaded to the cache when reading files
like hydra-build-products.
This enables an optimisation in hydra-queue-runner, preventing a
download of a NAR it just uploaded to the cache when reading files
like hydra-build-products.
This allows a RemoteStore object to be used safely from multiple
threads concurrently. It will make multiple daemon connections if
necessary.
Note: pool.hh and sync.hh have been copied from the Hydra source tree.
This is currently only used by the Hydra queue runner rework, but like
eff5021eaa it presumably will be useful
for the C++ rewrite of nix-push and
download-from-binary-cache. (@shlevy)
Also, move a few free-standing functions into StoreAPI and Derivation.
Also, introduce a non-nullable smart pointer, ref<T>, which is just a
wrapper around std::shared_ptr ensuring that the pointer is never
null. (For reference-counted values, this is better than passing a
"T&", because the latter doesn't maintain the refcount. Usually, the
caller will have a shared_ptr keeping the value alive, but that's not
always the case, e.g., when passing a reference to a std::thread via
std::bind.)
For example,
$ nix-build --hash -A nix-repl.src
will build the fixed-output derivation nix-repl.src (a fetchFromGitHub
call), but instead of *verifying* the hash given in the Nix
expression, it prints out the resulting hash, and then moves the
result to its content-addressed location in the Nix store. E.g
build produced path ‘/nix/store/504a4k6zi69dq0yjc0bm12pa65bccxam-nix-repl-8a2f5f0607540ffe56b56d52db544373e1efb980-src’ with sha256 hash ‘0cjablz01i0g9smnavhf86imwx1f9mnh5flax75i615ml71gsr88’
The goal of this is to make all nix-prefetch-* scripts unnecessary: we
can just let Nix run the real thing (i.e., the corresponding fetch*
derivation).
Another example:
$ nix-build --hash -E 'with import <nixpkgs> {}; fetchgit { url = "https://github.com/NixOS/nix.git"; sha256 = "ffffffffffffffffffffffffffffffffffffffffffffffffffff"; }'
...
git revision is 9e7c1a4bbd
...
build produced path ‘/nix/store/gmsnh9i7x4mb7pyd2ns7n3c9l90jfsi1-nix’ with sha256 hash ‘1188xb621diw89n25rifqg9lxnzpz7nj5bfh4i1y3dnis0dmc0zp’
(Having to specify a fake sha256 hash is a bit annoying...)
Previously files in the Nix store were owned by root or by nixbld,
depending on whether they were created by a substituter or by a
builder. This doesn't matter much, but causes spurious diffoscope
differences. So use root everywhere.
E.g.
$ nix-build pkgs/stdenv/linux/ -A stage1.pkgs.perl --check
nix-store: src/libstore/build.cc:1323: void nix::DerivationGoal::tryToBuild(): Assertion `buildMode != bmCheck || validPaths.size() == drv->outputs.size()' failed.
when perl.out exists but perl.man doesn't. The fix is to only check
the outputs that exist. Note that "nix-build -A stage1.pkgs.all
--check" will still give a (proper) error in this case.
This was observed in the deb_debian7x86_64 build:
http://hydra.nixos.org/build/29973215
Calling c_str() on a temporary should be fine because the temporary
shouldn't be destroyed until after the execl() call, but who knows...
If repair found a corrupted/missing path that depended on a
multiple-output derivation, and some of the outputs of the latter were
not present, it failed with a message like
error: path ‘/nix/store/cnfn9d5fjys1y93cz9shld2xwaibd7nn-bash-4.3-p42-doc’ is not valid
This makes Darwin consistent with Linux: Nix expressions can't break
out of the sandbox unless relaxed sandbox mode is enabled.
For the normal sandbox mode this will require fixing #759 however.
Otherwise, since the call to write a "d" character to the lock file
can fail with ENOSPC, we can get an unhandled exception resulting in a
call to terminate().
Caused by 8063fc497a. If tmpDir !=
tmpDirInSandbox (typically when there are multiple concurrent builds
with the same name), the *Path attribute would not point to an
existing file. This caused Nixpkgs' writeTextFile to write an empty
file. In particular this showed up as hanging VM builds (because it
would run an empty run-nixos-vm script and then wait for it to finish
booting).
This is arguably nitpicky, but I think this new formulation is even
clearer. My thinking is that it's easier to comprehend when the
calculated hash value is displayed close to the output path. (I think it
is somewhat similar to eliminating double negatives in logic
statements.)
The formulation is inspired / copied from the OpenEmbedded build tool,
bitbake.
Rather than using $<host-TMPDIR>/nix-build-<drvname>-<number>, the
temporary directory is now always /tmp/nix-build-<drvname>-0. This
improves bitwise-exact reproducibility for builds that store $TMPDIR
in their build output. (Of course, those should still be fixed...)
Temporarily allow derivations to describe their full sandbox profile.
This will be eventually scaled back to a more secure setup, see the
discussion at #695
Nix reports a hash mismatch saying:
output path ‘foo’ should have sha256 hash ‘abc’, instead has ‘xyz’
That message is slightly ambiguous and some people read that statement
to mean the exact opposite of what it is supposed to mean. After this
patch, the message will be:
Nix expects output path ‘foo’ to have sha256 hash ‘abc’, instead it has ‘xyz’
- rename options but leav old names as lower-priority aliases,
also "-dirs" -> "-paths" to get closer to the meaning
- update docs to reflect the new names (old aliases are not documented),
including a new file with release notes
- tests need an update after corresponding changes to nixpkgs
- __noChroot is left as it is (after discussion on the PR)
Passing "--option build-repeat <N>" will cause every build to be
repeated N times. If the build output differs between any round, the
build is rejected, and the output paths are not registered as
valid. This is primarily useful to verify build determinism. (We
already had a --check option to repeat a previously succeeded
build. However, with --check, non-deterministic builds are registered
in the DB. Preventing that is useful for Hydra to ensure that
non-deterministic builds don't end up getting published at all.)
This reverts commit 79ca503332. Ouch,
never noticed this. We definitely don't want to allow builds to have
arbitrary access to /bin and /usr/bin, because then they can (for
instance) bring in a bunch of setuid programs. Also, we shouldn't be
encouraging the use of impurities in the default configuration.
If automatic store optimisation is enabled, and a hard-linked file in
the store gets corrupted, then the corresponding .links entry will
also be corrupted. In that case, trying to repair with --repair or
--repair-path won't work, because the new "good" file will be replaced
by a hard link to the corrupted file. We can catch most of these cases
by doing a sanity-check on the file sizes.
This removes the need to have multiple downloads in the stdenv
bootstrap process (like a separate busybox binary for Linux, or
curl/mkdir/sh/bzip2 for Darwin). Now all those files can be combined
into a single NAR.
This makes it consistent with the Nixpkgs fetchurl and makes it work
in chroots. We don't need verification because the hash of the result
is checked anyway.
The stack allocated for the builder was way too small (32 KB). This is
sufficient for normal derivations, because they just do some setup and
then exec() the actual builder. But for the fetchurl builtin
derivation it's not enough. Also, allocating the stack on the caller's
stack was fishy business.
Previously, pkg-config was already queried for libsqlite3's and
libcurl's link flags. However they were not used, but hardcoded
instead. This commit replaces the hardcoded LDFLAGS by the ones
provided by pkg-config in a similar pattern as already used for
libsodium.
Fixes https://github.com/NixOS/nixpkgs/issues/9504.
Note that this means we may have a non-functional /bin/sh in the
chroot while rebuilding Bash or one of its dependencies. Ideally those
packages don't rely on /bin/sh though.
This ensures that 1) the derivation doesn't change when Nix changes;
2) the derivation closure doesn't contain Nix and its dependencies; 3)
we don't have to rely on ugly chroot hacks.
In particular, hydra-queue-runner can now distinguish between remote
build / substitution / already-valid. For instance, if a path already
existed on the remote side, we don't want to store a log file.
Previously, to build a derivation remotely, we had to copy the entire
closure of the .drv file to the remote machine, even though we only
need the top-level derivation. This is very wasteful: the closure can
contain thousands of store paths, and in some Hydra use cases, include
source paths that are very large (e.g. Git/Mercurial checkouts).
So now there is a new operation, StoreAPI::buildDerivation(), that
performs a build from an in-memory representation of a derivation
(BasicDerivation) rather than from a on-disk .drv file. The only files
that need to be in the Nix store are the sources of the derivation
(drv.inputSrcs), and the needed output paths of the dependencies (as
described by drv.inputDrvs). "nix-store --serve" exposes this
interface.
Note that this is a privileged operation, because you can construct a
derivation that builds any store path whatsoever. Fixing this will
require changing the hashing scheme (i.e., the output paths should be
computed from the other fields in BasicDerivation, allowing them to be
verified without access to other derivations). However, this would be
quite nice because it would allow .drv-free building (e.g. "nix-env
-i" wouldn't have to write any .drv files to disk).
Fixes#173.
The following patch is an attempt to address this bug (see
<http://bugs.gnu.org/18994>) by preserving the supplementary groups of
build users in the build environment.
In practice, I would expect that supplementary groups would contain only
one or two groups: the build users group, and possibly the “kvm” group.
[Changed &at(0) to data() and removed tabs - Eelco]
Not substituting builds with "preferLocalBuild = true" was a bad idea,
because it didn't take the cost of dependencies into account. For
instance, if we can't substitute a fetchgit call, then we have to
download/build git and all its dependencies.
Partially reverts 5558652709 and adds a
new derivation attribute "allowSubstitutes" to specify whether a
derivation may be substituted.
Nixpkgs' writeTextAsFile does this:
mv "$textPath" "$n"
Since $textPath was owned by root, if $textPath is on the same
filesystem as $n, $n will be owned as root. As a result, the build
result was rejected as having suspicious ownership.
http://hydra.nixos.org/build/22836807
Hello!
The patch below adds a ‘verifyStore’ RPC with the same signature as the
current LocalStore::verifyStore method.
Thanks,
Ludo’.
>From aef46c03ca77eb6344f4892672eb6d9d06432041 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo@gnu.org>
Date: Mon, 1 Jun 2015 23:17:10 +0200
Subject: [PATCH] Add a 'verifyStore' remote procedure call.
This hook can be used to set system-specific per-derivation build
settings that don't fit into the derivation model and are too complex or
volatile to be hard-coded into nix. Currently, the pre-build hook can
only add chroot dirs/files through the interface, but it also has full
access to the chroot root.
The specific use case for this is systems where the operating system ABI
is more complex than just the kernel-support system calls. For example,
on OS X there is a set of system-provided frameworks that can reliably
be accessed by any program linked to them, no matter the version the
program is running on. Unfortunately, those frameworks do not
necessarily live in the same locations on each version of OS X, nor do
their dependencies, and thus nix needs to know the specific version of
OS X currently running in order to make those frameworks available. The
pre-build hook is a perfect mechanism for doing just that.
This hook can be used to set system specific per-derivation build
settings that don't fit into the derivation model and are too complex or
volatile to be hard-coded into nix. Currently, the pre-build hook can
only add chroot dirs/files.
The specific use case for this is systems where the operating system ABI
is more complex than just the kernel-supported system calls. For
example, on OS X there is a set of system-provided frameworks that can
reliably be accessed by any program linked to them, no matter the
version the program is running on. Unfortunately, those frameworks do
not necessarily live in the same locations on each version of OS X, nor
do their dependencies, and thus nix needs to know the specific version
of OS X currently running in order to make those frameworks available.
The pre-build hook is a perfect mechanism for doing just that.
This is because we don't want to do HTTP requests on every evaluation,
even though we can prevent a full redownload via the cached ETag. The
default is one hour.
This was causing NixOS VM tests to fail mysteriously since
5ce50cd99e. Nscd could (sometimes) no
longer read /etc/hosts:
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
Probably there was some wacky interaction between the guest kernel and
the 9pfs implementation in QEMU.
Thus, for example, to get /bin/sh in a chroot, you only need to
specify /bin/sh=${pkgs.bash}/bin/sh in build-chroot-dirs. The
dependencies of sh will be added automatically.
I'm seeing hangs in Glibc's setxid_mark_thread() again. This is
probably because the use of an intermediate process to make clone()
safe from a multi-threaded program (see
524f89f139) is defeated by the use of
vfork(), since the intermediate process will have a copy of Glibc's
threading data structures due to the vfork(). So use a regular fork()
again.
If ‘build-use-chroot’ is set to ‘true’, fixed-output derivations are
now also chrooted. However, unlike normal derivations, they don't get
a private network namespace, so they can still access the
network. Also, the use of the ‘__noChroot’ derivation attribute is
no longer allowed.
Setting ‘build-use-chroot’ to ‘relaxed’ gives the old behaviour.
chroot only changes the process root directory, not the mount namespace root
directory, and it is well-known that any process with chroot capability can
break out of a chroot "jail". By using pivot_root as well, and unmounting the
original mount namespace root directory, breaking out becomes impossible.
Non-root processes typically have no ability to use chroot() anyway, but they
can gain that capability through the use of clone() or unshare(). For security
reasons, these syscalls are limited in functionality when used inside a normal
chroot environment. Using pivot_root() this way does allow those syscalls to be
put to their full use.
I.e., not readable to the nixbld group. This improves purity a bit for
non-chroot builds, because it prevents a builder from enumerating
store paths (i.e. it can only access paths it knows about).
Especially in WAL mode on a highly loaded machine, this is not a good
idea because it results in a WAL file of approximately the same size
ad the database, which apparently cannot be deleted while anybody is
accessing it.
For the "stdenv accidentally referring to bootstrap-tools", it seems
easier to specify the path that we don't want to depend on, e.g.
disallowedRequisites = [ bootstrapTools ];
It turns out that using clone() to start a child process is unsafe in
a multithreaded program. It can cause the initialisation of a build
child process to hang in setgroups(), as seen several times in the
build farm:
The reason is that Glibc thinks that the other threads of the parent
exist in the child, so in setxid_mark_thread() it tries to get a futex
that has been acquired by another thread just before the clone(). With
fork(), Glibc runs pthread_atfork() handlers that take care of this
(in particular, __reclaim_stacks()). But clone() doesn't do that.
Fortunately, we can use fork()+unshare() instead of clone() to set up
private namespaces.
See also https://www.mail-archive.com/lxc-devel@lists.linuxcontainers.org/msg03434.html.
The Nixpkgs stdenv prints some custom escape sequences to denote
nesting and stuff like that. Most terminals (e.g. xterm, konsole)
ignore them, but some do not (e.g. xfce4-terminal). So for the benefit
of the latter, filter them out.
If a root is a regular file, then its name must denote a store
path. For instance, the existence of the file
/nix/var/nix/gcroots/per-user/eelco/hydra-roots/wzc3cy1wwwd6d0dgxpa77ijr1yp50s6v-libxml2-2.7.7
would cause
/nix/store/wzc3cy1wwwd6d0dgxpa77ijr1yp50s6v-libxml2-2.7.7
to be a root.
This is useful because it involves less I/O (no need for a readlink()
call) and takes up less disk space (the symlink target typically takes
up a full disk block, while directory entries are packed more
efficiently). This is particularly important for hydra.nixos.org,
which has hundreds of thousands of roots, and where reading the roots
can take 25 minutes.
‘trusted-users’ is a list of users and groups that have elevated
rights, such as the ability to specify binary caches. It defaults to
‘root’. A typical value would be ‘@wheel’ to specify all users in the
wheel group.
‘allowed-users’ is a list of users and groups that are allowed to
connect to the daemon. It defaults to ‘*’. A typical value would be
‘@users’ to specify the ‘users’ group.
When running NixOps under Mac OS X, we need to be able to import store
paths built on Linux into the local Nix store. However, HFS+ is
usually case-insensitive, so if there are directories with file names
that differ only in case, then importing will fail.
The solution is to add a suffix ("~nix~case~hack~<integer>") to
colliding files. For instance, if we have a directory containing
xt_CONNMARK.h and xt_connmark.h, then the latter will be renamed to
"xt_connmark.h~nix~case~hack~1". If a store path is dumped as a NAR,
the suffixes are removed. Thus, importing and exporting via a
case-insensitive Nix store is round-tripping. So when NixOps calls
nix-copy-closure to copy the path to a Linux machine, you get the
original file names back.
Closes#119.
This makes things more efficient (we don't need to use an SSH master
connection, and we only start a single remote process) and gets rid of
locking issues (the remote nix-store process will keep inputs and
outputs locked as long as they're needed).
It also makes it more or less secure to connect directly to the root
account on the build machine, using a forced command
(e.g. ‘command="nix-store --serve --write"’). This bypasses the Nix
daemon and is therefore more efficient.
Also, don't call nix-store to import the output paths.
When copying a large path causes the daemon to run out of memory, you
now get:
error: Nix daemon out of memory
instead of:
error: writing to file: Broken pipe
If a build log is not available locally, then ‘nix-store -l’ will now
try to download it from the servers listed in the ‘log-servers’ option
in nix.conf. For instance, if you have:
log-servers = http://hydra.nixos.org/log
then it will try to get logs from http://hydra.nixos.org/log/<base
name of the store path>. So you can do things like:
$ nix-store -l $(which xterm)
and get a log even if xterm wasn't built locally.
By preloading all inodes in the /nix/store/.links directory, we can
quickly determine of a hardlinked file was already linked to the hashed
links.
This is tolerant of removing the .links directory, it will simply
recalculate all hashes in the store.
If an inode in the Nix store has more than 1 link, it probably means that it was linked into .links/ by us. If so, skip.
There's a possibility that something else hardlinked the file, so it would be nice to be able to override this.
Also, by looking at the number of hardlinks for each of the files in .links/, you can get deduplication numbers and space savings.
While running Python 3’s test suite, we noticed that on some systems
/dev/pts/ptmx is created with permissions 0 (that’s the case with my
Nixpkgs-originating 3.0.43 kernel, but someone with a Debian-originating
3.10-3 reported not having this problem.)
There’s still the problem that people without
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y are screwed (as noted in build.cc),
but I don’t see how we could work around it.
Since the addition of build-max-log-size, a call to
handleChildOutput() can result in cancellation of a goal. This
invalidated the "j" iterator in the waitForInput() loop, even though
it was still used afterwards. Likewise for the maxSilentTime
handling.
Probably fixes#231. At least it gets rid of the valgrind warnings.
The daemon now creates /dev deterministically (thanks!). However, it
expects /dev/kvm to be present.
The patch below restricts that requirement (1) to Linux-based systems,
and (2) to systems where /dev/kvm already exists.
I’m not sure about the way to handle (2). We could special-case
/dev/kvm and create it (instead of bind-mounting it) in the chroot, so
it’s always available; however, it wouldn’t help much since most likely,
if /dev/kvm missing, then KVM support is missing.
We were relying on SubstitutionGoal's destructor releasing the lock,
but if a goal is a top-level goal, the destructor won't run in a
timely manner since its reference count won't drop to zero. So
release it explicitly.
Fixes#178.
The flag ‘--check’ to ‘nix-store -r’ or ‘nix-build’ will cause Nix to
redo the build of a derivation whose output paths are already valid.
If the new output differs from the original output, an error is
printed. This makes it easier to test if a build is deterministic.
(Obviously this cannot catch all sources of non-determinism, but it
catches the most common one, namely the current time.)
For example:
$ nix-build '<nixpkgs>' -A patchelf
...
$ nix-build '<nixpkgs>' -A patchelf --check
error: derivation `/nix/store/1ipvxsdnbhl1rw6siz6x92s7sc8nwkkb-patchelf-0.6' may not be deterministic: hash mismatch in output `/nix/store/4pc1dmw5xkwmc6q3gdc9i5nbjl4dkjpp-patchelf-0.6.drv'
The --check build fails if not all outputs are valid. Thus the first
call to nix-build is necessary to ensure that all outputs are valid.
The current outputs are left untouched: the new outputs are either put
in a chroot or diverted to a different location in the store using
hash rewriting.
This substituter connects to a remote host, runs nix-store --serve
there, and then forwards substituter commands on to the remote host and
sends their results to the calling program. The ssh-substituter-hosts
option can be specified as a list of hosts to try.
This is an initial implementation and, while it works, it has some
limitations:
* Only the first host is used
* There is no caching of query results (all queries are sent to the
remote machine)
* There is no informative output (such as progress bars)
* Some failure modes may cause unhelpful error messages
* There is no concept of trusted-ssh-substituter-hosts
Signed-off-by: Shea Levy <shea@shealevy.com>
nix-store --export takes a tmproot, which can only release by exiting.
Substituters don't currently work in a way that could take advantage of
the looping, anyway.
Signed-off-by: Shea Levy <shea@shealevy.com>
This is essentially the substituter API operating on the local store,
which will be used by the ssh substituter. It runs in a loop rather than
just taking one command so that in the future nix will be able to keep
one connection open for multiple instances of the substituter.
Signed-off-by: Shea Levy <shea@shealevy.com>
Namely:
nix-store: derivations.cc:242: nix::Hash nix::hashDerivationModulo(nix::StoreAPI&, nix::Derivation): Assertion `store.isValidPath(i->first)' failed.
This happened because of the derivation output correctness check being
applied before the references of a derivation are valid.
*headdesk*
*headdesk*
*headdesk*
So since commit 22144afa8d, Nix hasn't
actually checked whether the content of a downloaded NAR matches the
hash specified in the manifest / NAR info file. Urghhh...
In particular "libutil" was always a problem because it collides with
Glibc's libutil. Even if we install into $(libdir)/nix, the linker
sometimes got confused (e.g. if a program links against libstore but
not libutil, then ld would report undefined symbols in libstore
because it was looking at Glibc's libutil).
Note that adding --show-trace prevents functions calls from being
tail-recursive, so an expression that evaluates without --show-trace
may fail with a stack overflow if --show-trace is given.
I.e. "nix-store -q --roots" will now show (for example)
/home/eelco/Dev/nixpkgs/result
rather than
/nix/var/nix/gcroots/auto/53222qsppi12s2hkap8dm2lg8xhhyk6v
There is no risk of getting an inconsistent result here: if the ID
returned by queryValidPathId() is deleted from the database
concurrently, subsequent queries involving that ID will simply fail
(since IDs are never reused).
In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors
(e.g., "querying path in database: locking protocol"). The docs for
this error code say that it "is returned if some other process is
messing with file locks and has violated the file locking protocol
that SQLite uses on its rollback journal files." However, the SQLite
source code reveals that this error can also occur under high load:
if( cnt>5 ){
int nDelay = 1; /* Pause time in microseconds */
if( cnt>100 ){
VVA_ONLY( pWal->lockError = 1; )
return SQLITE_PROTOCOL;
}
if( cnt>=10 ) nDelay = (cnt-9)*238; /* Max delay 21ms. Total delay 996ms */
sqlite3OsSleep(pWal->pVfs, nDelay);
}
i.e. if certain locks cannot be not acquired, SQLite will retry a
number of times before giving up and returing SQLITE_PROTOCOL. The
comments say:
Circumstances that cause a RETRY should only last for the briefest
instances of time. No I/O or other system calls are done while the
locks are held, so the locks should not be held for very long. But
if we are unlucky, another process that is holding a lock might get
paged out or take a page-fault that is time-consuming to resolve,
during the few nanoseconds that it is holding the lock. In that case,
it might take longer than normal for the lock to free.
...
The total delay time before giving up is less than 1 second.
On a heavily loaded machine like lucifer (the main Hydra server),
which often has dozens of processes waiting for I/O, it seems to me
that a page fault could easily take more than a second to resolve.
So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the
transaction.
Issue NixOS/hydra#14.
As discovered by Todd Veldhuizen, the shell started by nix-shell has
its affinity set to a single CPU. This is because nix-shell connects
to the Nix daemon, which causes the affinity hack to be applied. So
we turn this off for Perl programs.
On Linux, Nix can build i686 packages even on x86_64 systems. It's not
enough to recognize this situation by settings.thisSystem, we also have
to consult uname(). E.g. we can be running on a i686 Debian with an
amd64 kernel. In that situation settings.thisSystem is i686-linux, but
we still need to change personality to i686 to make builds consistent.
On a system with multiple CPUs, running Nix operations through the
daemon is significantly slower than "direct" mode:
$ NIX_REMOTE= nix-instantiate '<nixos>' -A system
real 0m0.974s
user 0m0.875s
sys 0m0.088s
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m2.118s
user 0m1.463s
sys 0m0.218s
The main reason seems to be that the client and the worker get moved
to a different CPU after every call to the worker. This patch adds a
hack to lock them to the same CPU. With this, the overhead of going
through the daemon is very small:
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m1.074s
user 0m0.809s
sys 0m0.098s
This reverts commit 69b8f9980f.
The timeout should be enforced remotely. Otherwise, if the garbage
collector is running either locally or remotely, if will block the
build or closure copying for some time. If the garbage collector
takes too long, the build may time out, which is not what we want.
Also, on heavily loaded systems, copying large paths to and from the
remote machine can take a long time, also potentially resulting in a
timeout.
mount(2) with MS_BIND allows mounting a regular file on top of a regular
file, so there's no reason to only bind directories. This allows finer
control over just which files are and aren't included in the chroot
without having to build symlink trees or the like.
Signed-off-by: Shea Levy <shea@shealevy.com>
With C++ std::map, doing a comparison like ‘map["foo"] == ...’ has the
side-effect of adding a mapping from "foo" to the empty string if
"foo" doesn't exist in the map. So we ended up setting some
environment variables by accident.
In particular this means that "trivial" derivations such as writeText
are not substituted, reducing the number of GET requests to the binary
cache by about 200 on a typical NixOS configuration.
This substituter basically cannot work reliably since we switched to
SQLite, since SQLite databases may need write access to open them even
just for reading (and in WAL mode they always do).
For instance, it's pointless to keep copy-from-other-stores running if
there are no other stores, or download-using-manifests if there are no
manifests. This also speeds things up because we don't send queries
to those substituters.
Before calling dumpPath(), we have to make sure the files are owned by
the build user. Otherwise, the build could contain a hard link to
(say) /etc/shadow, which would then be read by the daemon and
rewritten as a world-readable file.
This only affects systems that don't have hard link restrictions
enabled.
The assertion in canonicalisePathMetaData() failed because the
ownership of the path already changed due to the hash rewriting. The
solution is not to check the ownership of rewritten paths.
Issue #122.
Otherwise subsequent invocations of "--repair" will keep rebuilding
the path. This only happens if the path content differs between
builds (e.g. due to timestamps).
Don't pass --timeout / --max-silent-time to the remote builder.
Instead, let the local Nix process terminate the build if it exceeds a
timeout. The remote builder will be killed as a side-effect. This
gives better error reporting (since the timeout message from the
remote side wasn't properly propagated) and handles non-Nix problems
like SSH hangs.
I'm not sure if it has ever worked correctly. The line "lastWait =
after;" seems to mean that the timer was reset every time a build
produced log output.
Note that the timeout is now per build, as documented ("the maximum
number of seconds that a builder can run").
It is surprisingly impossible to check if a mountpoint is a bind mount
on Linux, and in my previous commit I forgot to check if /nix/store was
even a mountpoint at all. statvfs.f_flag is not populated with MS_BIND
(and even if it were, my check was wrong in the previous commit).
Luckily, the semantics of mount with MS_REMOUNT | MS_BIND make both
checks unnecessary: if /nix/store is not a mountpoint, then mount will
fail with EINVAL, and if /nix/store is not a bind-mount, then it will
not be made writable. Thus, if /nix/store is not a mountpoint, we fail
immediately (since we don't know how to make it writable), and if
/nix/store IS a mountpoint but not a bind-mount, we fail at first write
(see below for why we can't check and fail immediately).
Note that, due to what is IMO buggy behavior in Linux, calling mount
with MS_REMOUNT | MS_BIND on a non-bind readonly mount makes the
mountpoint appear writable in two places: In the sixth (but not the
10th!) column of mountinfo, and in the f_flags member of struct statfs.
All other syscalls behave as if the mount point were still readonly (at
least for Linux 3.9-rc1, but I don't think this has changed recently or
is expected to soon). My preferred semantics would be for MS_REMOUNT |
MS_BIND to fail on a non-bind mount, as it doesn't make sense to remount
a non bind-mount as a bind mount.