It turns out that using clone() to start a child process is unsafe in
a multithreaded program. It can cause the initialisation of a build
child process to hang in setgroups(), as seen several times in the
build farm:
The reason is that Glibc thinks that the other threads of the parent
exist in the child, so in setxid_mark_thread() it tries to get a futex
that has been acquired by another thread just before the clone(). With
fork(), Glibc runs pthread_atfork() handlers that take care of this
(in particular, __reclaim_stacks()). But clone() doesn't do that.
Fortunately, we can use fork()+unshare() instead of clone() to set up
private namespaces.
See also https://www.mail-archive.com/lxc-devel@lists.linuxcontainers.org/msg03434.html.
The Nixpkgs stdenv prints some custom escape sequences to denote
nesting and stuff like that. Most terminals (e.g. xterm, konsole)
ignore them, but some do not (e.g. xfce4-terminal). So for the benefit
of the latter, filter them out.
If a root is a regular file, then its name must denote a store
path. For instance, the existence of the file
/nix/var/nix/gcroots/per-user/eelco/hydra-roots/wzc3cy1wwwd6d0dgxpa77ijr1yp50s6v-libxml2-2.7.7
would cause
/nix/store/wzc3cy1wwwd6d0dgxpa77ijr1yp50s6v-libxml2-2.7.7
to be a root.
This is useful because it involves less I/O (no need for a readlink()
call) and takes up less disk space (the symlink target typically takes
up a full disk block, while directory entries are packed more
efficiently). This is particularly important for hydra.nixos.org,
which has hundreds of thousands of roots, and where reading the roots
can take 25 minutes.
Other operations cannot hang indefinitely (except when we're reading
from stdin, in which case we'll notice a client disconnect). But
monitoring works badly during compressed imports, since there the
client can close the connection before we've sent an ack.
http://hydra.nixos.org/build/12711638
Signal handlers are process-wide, so sending SIGINT to the monitor
thread will cause the normal SIGINT handler to run. This sets the
isInterrupted flag, which is not what we want. So use pthread_cancel
instead.