Make the garbage collector more concurrent by deleting valid paths
outside the region where we're holding the global GC lock. This
should greatly reduce the time during which new builds are blocked,
since the deletion accounts for the vast majority of the time spent in
the GC.
To ensure that this is safe, the valid paths are invalidated and
renamed to some arbitrary path while we're holding the lock. This
ensures that we when we finally delete the path, it's not a (newly)
valid or locked path.
Nix now requires SQLite and bzip2 to be pre-installed. SQLite is
detected using pkg-config. We required DBD::SQLite anyway, so
depending on SQLite is not a big problem.
The --with-bzip2, --with-openssl and --with-sqlite flags are gone.
By moving the destructor object to libstore.so, it's also run when
download-using-manifests and nix-prefetch-url exit. This prevents
them from cluttering /nix/var/nix/temproots with stale files.
Not all SQLite builds have the function sqlite3_table_column_metadata.
We were only using it in a schema upgrade check for compatibility with
databases that were probably never seen in the wild. So remove it.
The variable ‘useChroot’ was not initialised properly. This caused
random failures if using the build hook. Seen on Mac OS X 10.7 with Clang.
Thanks to KolibriFX for finding this :-)
Chroots are initialised by hard-linking inputs from the Nix store to
the chroot. This doesn't work if the input has its immutable bit set,
because it's forbidden to create hard links to immutable files. So
temporarily clear the immutable bit when creating and destroying the
chroot.
Note that making regular files in the Nix store immutable isn't very
reliable, since the bit can easily become cleared: for instance, if we
run the garbage collector after running ‘nix-store --optimise’. So
maybe we should only make directories immutable.
I was bitten one time too many by Python modifying the Nix store by
creating *.pyc files when run as root. On Linux, we can prevent this
by setting the immutable bit on files and directories (as in ‘chattr
+i’). This isn't supported by all filesystems, so it's not an error
if setting the bit fails. The immutable bit is cleared by the garbage
collector before deleting a path. The only tricky aspect is in
optimiseStore(), since it's forbidden to create hard links to an
immutable file. Thus optimiseStore() temporarily clears the immutable
bit before creating the link.
unreachable paths. This matters when using --max-freed etc.:
unreachable paths could become reachable again, so it's nicer to
keep them if there is "real" garbage to be deleted. Also, don't use
readDirectory() but read the Nix store and delete invalid paths in
parallel. This reduces GC latency on very large Nix stores.
* Buffer the HashSink. This speeds up hashing a bit because it
prevents lots of calls to the hash update functions (e.g. nix-hash
went from 9.3s to 8.7s of user time on the closure of my
/var/run/current-system).
daemon (which is an error), print a nicer error message than
"Connection reset by peer" or "broken pipe".
* In the daemon, log errors that occur during request parameter
processing.
‘nix-store --export’.
* Add a Perl module that provides the functionality of
‘nix-copy-closure --to’. This is used by build-remote.pl so it no
longer needs to start a separate nix-copy-closure process. Also, it
uses the Perl API to do the export, so it doesn't need to start a
separate nix-store process either. As a result, nix-copy-closure
and build-remote.pl should no longer fail on very large closures due
to an "Argument list too long" error. (Note that having very many
dependencies in a single derivation can still fail because the
environment can become too large. Can't be helped though.)
libstore so that the Perl bindings can use it as well. It's vital
that the Perl bindings use the configuration file, because otherwise
nix-copy-closure will fail with a ‘database locked’ message if the
value of ‘use-sqlite-wal’ is changed from the default.
This should also fix:
nix-instantiate: ./../boost/shared_ptr.hpp:254: T* boost::shared_ptr<T>::operator->() const [with T = nix::StoreAPI]: Assertion `px != 0' failed.
which was caused by hashDerivationModulo() calling the ‘store’
object (during store upgrades) before openStore() assigned it.
derivations added to the store by clients have "correct" output
paths (meaning that the output paths are computed by hashing the
derivation according to a certain algorithm). This means that a
malicious user could craft a special .drv file to build *any*
desired path in the store with any desired contents (so long as the
path doesn't already exist). Then the attacker just needs to wait
for a victim to come along and install the compromised path.
For instance, if Alice (the attacker) knows that the latest Firefox
derivation in Nixpkgs produces the path
/nix/store/1a5nyfd4ajxbyy97r1fslhgrv70gj8a7-firefox-5.0.1
then (provided this path doesn't already exist) she can craft a .drv
file that creates that path (i.e., has it as one of its outputs),
add it to the store using "nix-store --add", and build it with
"nix-store -r". So the fake .drv could write a Trojan to the
Firefox path. Then, if user Bob (the victim) comes along and does
$ nix-env -i firefox
$ firefox
he executes the Trojan injected by Alice.
The fix is to have the Nix daemon verify that derivation outputs are
correct (in addValidPath()). This required some refactoring to move
the hash computation code to libstore.
while checking the contents, since this operation can take a very
long time to finish. Also, fill in missing narSize fields in the DB
while doing this.
even with a very long busy timeout, because SQLITE_BUSY is also
returned to resolve deadlocks. This should get rid of random
"database is locked" errors. This is kind of hard to test though.
* Fix a horrible bug in deleteFromStore(): deletePathWrapped() should
be called after committing the transaction, not before, because the
commit might not succeed.
race with other processes that add new referrers to a path,
resulting in the garbage collector crashing with "foreign key
constraint failed". (Nix/4)
* Make --gc --print-dead etc. interruptible.
because it defines _FILE_OFFSET_BITS. Without this, on
OpenSolaris the system headers define it to be 32, and then
the 32-bit stat() ends up being called with a 64-bit "struct
stat", or vice versa.
This also ensures that we get 64-bit file sizes everywhere.
* Remove the redundant call to stat() in parseExprFromFile().
The file cannot be a symlink because that's the exit condition
of the loop before.
* If a path has disappeared, check its referrers first, and don't try
to invalidate paths that have valid referrers. Otherwise we get a
foreign key constraint violation.
* Read the whole Nix store directory instead of statting each valid
path, which is slower.
* Acquire the global GC lock.
hook script proper, and the stdout/stderr of the builder. Only the
latter should be saved in /nix/var/log/nix/drvs.
* Allow the verbosity to be set through an option.
* Added a flag --quiet to lower the verbosity level.
it requires a certain feature on the build machine, e.g.
requiredSystemFeatures = [ "kvm" ];
We need this in Hydra to make sure that builds that require KVM
support are forwarded to machines that have KVM support. Probably
this should also be enforced for local builds.
the hook every time we want to ask whether we can run a remote build
(which can be very often), we now reuse a hook process for answering
those queries until it accepts a build. So if there are N
derivations to be built, at most N hooks will be started.
faster than the old mode when fsyncs are enabled, because it only
performs an fsync() when doing a checkpoint, rather than at every
commit. Some timings for doing a "nix-instantiate /etc/nixos/nixos
-A system" after modifying the stdenv setup script:
42.5s - SQLite 3.6.23 with truncate mode and fsync
3.4s - SQLite 3.6.23 with truncate mode and no fsync
32.1s - SQLite 3.7.0 with truncate mode and fsync
16.8s - SQLite 3.7.0 with WAL mode and fsync, auto-checkpoint
every 1000 pages
8.3s - SQLite 3.7.0 with WAL mode and fsync, auto-checkpoint
every 8192 pages
1.7s - SQLite 3.7.0 with WAL mode and no fsync
The default is now to use WAL mode with fsyncs. Because WAL doesn't
work on remote filesystems such as NFS (as it uses shared memory),
truncate mode can be re-enabled by setting the "use-sqlite-wal"
option to false.
using the build hook mechanism, by setting the derivation attribute
"preferLocalBuild" to true. This has a few use cases:
- The user environment builder. Since it just creates a bunch of
symlinks without much computation, there is no reason to do it
remotely. In fact, doing it remotely requires the entire closure
of the user environment to be copied to the remote machine, which
is extremely wasteful.
- `fetchurl'. Performing the download on a remote machine and then
copying it to the local machine involves twice as much network
traffic as performing the download locally, and doesn't save any
CPU cycles on the local machine.
An "using namespace std" was added locally in those functions that refer to
names from <cstring>. That is not pretty, but it's a very portable solution,
because strcpy() and friends will be found in both the 'std' and in the global
namespace.
This patch adds the configuration file variable "build-cores" and the
command line argument "--cores". These settings specify the number of
CPU cores to utilize for parallel building within a job, i.e. by passing
an appropriate "-j" flag to GNU Make. The default value is 1, which
means that parallel building is *disabled*. If the number of build cores
is specified as 0 (synonymously: "guess" or "auto"), then the actual
value is supposed to be auto-detected by builders at run-time, i.e by
calling the nproc(1) utility from coreutils.
The environment variable $NIX_BUILD_CORES is available to builders, but
the contents of that variable does *not* influence the hash that goes
into the $out store path, i.e. the number of build cores to be utilized
can be changed at will without requiring any re-builds.
doesn't work because the garbage collector doesn't actually look at
locks. So r22253 was stupid. Use addTempRoot() instead. Also,
locking the temporary directory in exportPath() was silly because it
isn't even in the store.
changed. This prevents corrupt paths from spreading to other
machines. Note that checking the hash is cheap because we're
hashing anyway (because of the --sign feature).
to make the Refs table more space-efficient. For instance, this
reduces the size of the database on my laptop from 93 MiB to 18
MiB. (It was 72 MiB with the old schema on an ext3 disk with a 1
KiB block size.)
This prevents remote builders from being killed by the
`max-silent-time' inactivity monitor while they are waiting for a
long garbage collection to finish. This happens fairly often in the
Hydra build farm.
_FILE_OFFSET_BITS=64. Without it, functions like stat() fail on
large file sizes. This happened with a Nix store on squashfs:
$ nix-store --dump /tmp/mnt/46wzqnk4cbdwh1dclhrpqnnz1icak6n7-local-net-cmds > /dev/null
error: getting attributes of path `/tmp/mnt/46wzqnk4cbdwh1dclhrpqnnz1icak6n7-local-net-cmds': Value too large for defined data type
$ stat /tmp/mnt/46wzqnk4cbdwh1dclhrpqnnz1icak6n7-local-net-cmds
File: `/tmp/mnt/46wzqnk4cbdwh1dclhrpqnnz1icak6n7-local-net-cmds'
Size: 0 Blocks: 36028797018963968 IO Block: 1024 regular empty file
(This is a bug in squashfs or mksquashfs, but it shouldn't cause Nix
to fail.)
complete set of live and dead paths before starting the actual
deletion, but determines liveness on demand. I.e. for any path in
the store, it first tries to delete all the referrers, and then the
path itself. This means that the collector can start deleting paths
almost immediately.
(Linux) machines no longer maintain the atime because it's too
expensive, and on the machines where --use-atime is useful (like the
buildfarm), reading the atimes on the entire Nix store takes way too
much time to make it practical.
UTC) rather than 0 (00:00:00). 1 is a better choice because some
programs use 0 as a special value. For instance, the Template
Toolkit uses a timestamp of 0 to denote the non-existence of a file,
so it barfs on files in the Nix store (see
template-toolkit-nix-store.patch in Nixpkgs). Similarly, Maya 2008
fails to load script directories with a timestamp of 0 and can't be
patched because it's closed source.
This will also shut up those "implausibly old time stamp" GNU tar
warnings.
(that is, call the build hook with a certain interval until it
accepts the build).
* build-remote.pl was totally broken: for all system types other than
the local system type, it would send all builds to the *first*
machine of the appropriate type.
poll for it (i.e. if we can't acquire the lock, then let the main
select() loop wait for at most a few seconds and then try again).
This improves parallelism: if two nix-store processes are both
trying to build a path at the same time, the second one shouldn't
block; it should first see if it can build other goals. Also, it
prevents the deadlocks that have been occuring in Hydra lately,
where a process waits for a lock held by another process that's
waiting for a lock held by the first.
The downside is that polling isn't really elegant, but POSIX doesn't
provide a way to wait for locks in a select() loop. The only
solution would be to spawn a thread for each lock to do a blocking
fcntl() and then signal the main thread, but that would require
pthreads.
would just silently store only (fileSize % 2^32) bytes.
* Use posix_fallocate if available when unpacking archives.
* Provide a better error message when trying to unpack something that
isn't a NAR archive.
sure that it works as expected when you pass it a derivation. That
is, we have to make sure that all build-time dependencies are built,
and that they are all in the input closure (otherwise remote builds
might fail, for example). This is ensured at instantiation time by
adding all derivations and their sources to inputDrvs and inputSrcs.
hook. This fixes a problem with log files being partially or
completely filled with 0's because another nix-store process
truncates the log file. It should also be more efficient.
the DerivationGoal runs. Otherwise, if a goal is a top-level goal,
then the lock won't be released until nix-store finishes. With
--keep-going and lots of top-level goals, it's possible to run out
of file descriptors (this happened sometimes in the build farm for
Nixpkgs). Also, for failed derivation, it won't be possible to
build it again until the lock is released.
* Idem for locks on build users: these weren't released in a timely
manner for failed top-level derivation goals. So if there were more
than (say) 10 such failed builds, you would get an error about
having run out of build users.
scan for runtime dependencies (i.e. the local machine shouldn't do a
scan that the remote machine has already done). Also pipe directly
into `nix-store --import': don't use a temporary file.
(e.g. an SSH connection problem) and permanent failures (i.e. the
builder failed). This matters to Hydra (it wants to know whether it
makes sense to retry a build).