add a reset() method to close the wrapped fd instead of assigning magic
constants. also make the from-fd constructor explicit so you can't
accidentally assign the *wrong* magic constant, or even an unrelated
integer that also just happens to be an fd by pure chance.
Change-Id: I51311b0f6e040240886b5103d39d1794a6acc325
The garbage collector no longer blocks other processes from
adding/building store paths or adding GC roots. To prevent the
collector from deleting store paths just added by another process,
processes need to connect to the garbage collector via a Unix domain
socket to register new temporary roots.
POSIX file locks are essentially incompatible with multithreading. BSD
locks have much saner semantics. We need this now that there can be
multiple concurrent LocalStore::buildPaths() invocations.
This was caused by derivations with 'allowSubstitutes = false'. Such
derivations will be built locally. However, if there is another
SubstitionGoal that has the output of the first derivation in its
closure, then the path will be simultaneously built and substituted.
There was a check to catch this situation (via pathIsLockedByMe()),
but it no longer worked reliably because substitutions are now done in
another thread. (Thus the comment 'It can't happen between here and
the lockPaths() call below because we're not allowing multi-threading'
was no longer valid.)
The fix is to handle the path already being locked in both
SubstitutionGoal and DerivationGoal.
Otherwise, since the call to write a "d" character to the lock file
can fail with ENOSPC, we can get an unhandled exception resulting in a
call to terminate().
This prevents remote builders from being killed by the
`max-silent-time' inactivity monitor while they are waiting for a
long garbage collection to finish. This happens fairly often in the
Hydra build farm.
poll for it (i.e. if we can't acquire the lock, then let the main
select() loop wait for at most a few seconds and then try again).
This improves parallelism: if two nix-store processes are both
trying to build a path at the same time, the second one shouldn't
block; it should first see if it can build other goals. Also, it
prevents the deadlocks that have been occuring in Hydra lately,
where a process waits for a lock held by another process that's
waiting for a lock held by the first.
The downside is that polling isn't really elegant, but POSIX doesn't
provide a way to wait for locks in a select() loop. The only
solution would be to spawn a thread for each lock to do a blocking
fcntl() and then signal the main thread, but that would require
pthreads.
the DerivationGoal runs. Otherwise, if a goal is a top-level goal,
then the lock won't be released until nix-store finishes. With
--keep-going and lots of top-level goals, it's possible to run out
of file descriptors (this happened sometimes in the build farm for
Nixpkgs). Also, for failed derivation, it won't be possible to
build it again until the lock is released.
* Idem for locks on build users: these weren't released in a timely
manner for failed top-level derivation goals. So if there were more
than (say) 10 such failed builds, you would get an error about
having run out of build users.
fixed-output derivations or substitutions try to build the same
store path at the same time. Locking generally catches this, but
not between multiple goals in the same process. This happened
especially often (actually, only) in the build farm with fetchurl
downloads of the same file being executed on multiple machines and
then copied back to the main machine where they would clobber each
other (NIXBF-13).
Solution: if a goal notices that the output path is already locked,
then go to sleep until another goal finishes (hopefully the one
locking the path) and try again.
parallel as possible (similar to GNU Make's `-j' switch). This is
useful on SMP systems, but it is especially useful for doing builds
on multiple machines. The idea is that a large derivation is
initiated on one master machine, which then distributes
sub-derivations to any number of slave machines. This should not
happen synchronously or in lock-step, so the master must be capable
of dealing with multiple parallel build jobs. We now have the
infrastructure to support this.
TODO: substitutes are currently broken.
Nix. This is to prevent Berkeley DB from becoming wedged.
Unfortunately it is not possible to throw C++ exceptions from a
signal handler. In fact, you can't do much of anything except
change variables of type `volatile sig_atomic_t'. So we set an
interrupt flag in the signal handler and check it at various
strategic locations in the code (by calling checkInterrupt()).
Since this is unlikely to cover all cases (e.g., (semi-)infinite
loops), sometimes SIGTERM may now be required to kill Nix.