Note that adding --show-trace prevents functions calls from being
tail-recursive, so an expression that evaluates without --show-trace
may fail with a stack overflow if --show-trace is given.
It kept temporary data in STL containers that were not scanned by
Boehm GC, so Nix programs using genericClosure could randomly crash if
the garbage collector kicked in at a bad time.
Also make it a bit more efficient by copying points to values rather
than values.
We already have some primops for determining the type of a value, such
as isString, but they're incomplete: for instance, there is no isPath.
Rather than adding more isBla functions, the generic typeOf function
returns a string representing the type of the argument (e.g. "int").
I.e. "nix-store -q --roots" will now show (for example)
/home/eelco/Dev/nixpkgs/result
rather than
/nix/var/nix/gcroots/auto/53222qsppi12s2hkap8dm2lg8xhhyk6v
Combined with the previous changes, stack traces involving derivations
are now much less verbose, since something like
while evaluating the builtin function `getAttr':
while evaluating the builtin function `derivationStrict':
while instantiating the derivation named `gtk+-2.24.20' at `/home/eelco/Dev/nixpkgs/pkgs/development/libraries/gtk+/2.x.nix:11:3':
while evaluating the derivation attribute `propagatedNativeBuildInputs' at `/home/eelco/Dev/nixpkgs/pkgs/stdenv/generic/default.nix:78:17':
while evaluating the attribute `outPath' at `/nix/store/212ngf4ph63mp6p1np2bapkfikpakfv7-nix-1.6/share/nix/corepkgs/derivation.nix:18:9':
...
now reads
while evaluating the attribute `propagatedNativeBuildInputs' of the derivation `gtk+-2.24.20' at `/home/eelco/Dev/nixpkgs/pkgs/development/libraries/gtk+/2.x.nix:11:3':
...
Messages like
while evaluating the attribute `outPath' at `/nix/store/212ngf4ph63mp6p1np2bapkfikpakfv7-nix-1.6/share/nix/corepkgs/derivation.nix:18:9':
are redundant, because Nix already shows that it's evaluating a derivation:
while instantiating the derivation named `firefox-24.0' at `/home/eelco/Dev/nixpkgs/pkgs/applications/networking/browsers/firefox/default.nix:131:5':
while evaluating the derivation attribute `nativeBuildInputs' at `/home/eelco/Dev/nixpkgs/pkgs/stdenv/generic/default.nix:76:17':
Commit 159e621d1a accidentally changed
the behaviour of antiquoted paths, e.g.
"${/foo}/bar"
used to evaluate to "/nix/store/<hash>-foo/bar" (where /foo gets
copied to the store), but in Nix 1.6 it evaluates to "/foo/bar". This
is inconsistent, since
" ${/foo}/bar"
evaluates to " /nix/store/<hash>-foo/bar". So revert to the old
behaviour.
There is no risk of getting an inconsistent result here: if the ID
returned by queryValidPathId() is deleted from the database
concurrently, subsequent queries involving that ID will simply fail
(since IDs are never reused).
In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors
(e.g., "querying path in database: locking protocol"). The docs for
this error code say that it "is returned if some other process is
messing with file locks and has violated the file locking protocol
that SQLite uses on its rollback journal files." However, the SQLite
source code reveals that this error can also occur under high load:
if( cnt>5 ){
int nDelay = 1; /* Pause time in microseconds */
if( cnt>100 ){
VVA_ONLY( pWal->lockError = 1; )
return SQLITE_PROTOCOL;
}
if( cnt>=10 ) nDelay = (cnt-9)*238; /* Max delay 21ms. Total delay 996ms */
sqlite3OsSleep(pWal->pVfs, nDelay);
}
i.e. if certain locks cannot be not acquired, SQLite will retry a
number of times before giving up and returing SQLITE_PROTOCOL. The
comments say:
Circumstances that cause a RETRY should only last for the briefest
instances of time. No I/O or other system calls are done while the
locks are held, so the locks should not be held for very long. But
if we are unlucky, another process that is holding a lock might get
paged out or take a page-fault that is time-consuming to resolve,
during the few nanoseconds that it is holding the lock. In that case,
it might take longer than normal for the lock to free.
...
The total delay time before giving up is less than 1 second.
On a heavily loaded machine like lucifer (the main Hydra server),
which often has dozens of processes waiting for I/O, it seems to me
that a page fault could easily take more than a second to resolve.
So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the
transaction.
Issue NixOS/hydra#14.
Previously, a undefined variable inside a "with" caused an EvalError
(which can be caught), while outside, it caused a ParseError (which
cannot be caught). Now both cause an UndefinedVarError (which cannot
be caught).
Since they don't have location information, they just give you crap
like:
while evaluating the builtin function `getAttr':
while evaluating the builtin function `derivationStrict':
...
If a "with" attribute set fails to evaluate, we have to make sure its
Env record remains unchanged. Otherwise, repeated evaluation gives a
segfault:
nix-repl> :a with 0; { a = x; b = x; }
Added 2 variables.
nix-repl> a
error: value is an integer while an attribute set was expected
nix-repl> b
Segmentation fault
As discovered by Todd Veldhuizen, the shell started by nix-shell has
its affinity set to a single CPU. This is because nix-shell connects
to the Nix daemon, which causes the affinity hack to be applied. So
we turn this off for Perl programs.
This is equivalent to running ‘nix-env -e '*'’ first, except that it
happens in a single transaction. Thus, ‘nix-env -i pkgs...’ replaces
the profile with the specified set of packages.
The main motivation is to support declarative package management
(similar to environment.systemPackages in NixOS). That is, if you
have a specification ‘profile.nix’ like this:
with import <nixpkgs> {};
[ thunderbird
geeqie
...
]
then after any change to ‘profile.nix’, you can run:
$ nix-env -f profile.nix -ir
to update the profile to match the specification. (Without the ‘-r’
flag, if you remove a package from ‘profile.nix’, it won't be removed
from the actual profile.)
Suggested by @zefhemel.
This prevents some duplicate evaluation in nix-env and
nix-instantiate.
Also, when traversing ~/.nix-defexpr, only read regular files with the
extension .nix. Previously it was reading files like
.../channels/binary-caches/<name>. The only reason this didn't cause
problems is pure luck (namely, <name> shadows an actual Nix
expression, the binary-caches files happen to be syntactically valid
Nix expressions, and we iterate over the directory contents in just
the right order).
Since we already cache files in normal form (fileEvalCache), caching
parse trees is redundant.
Note that getting rid of this cache doesn't actually save much memory
at the moment, because parse trees are currently not freed / GC'ed.
This reduces the difference between inherited and non-inherited
attribute handling to the choice of which env to use (in recs and lets)
by setting the AttrDef::e to a new ExprVar in the parser rather than
carrying a separate AttrDef::v VarRef member.
As an added bonus, this allows inherited attributes that inherit from a
with to delay forcing evaluation of the with's attributes.
Signed-off-by: Shea Levy <shea@shealevy.com>
On Linux, Nix can build i686 packages even on x86_64 systems. It's not
enough to recognize this situation by settings.thisSystem, we also have
to consult uname(). E.g. we can be running on a i686 Debian with an
amd64 kernel. In that situation settings.thisSystem is i686-linux, but
we still need to change personality to i686 to make builds consistent.
On a system with multiple CPUs, running Nix operations through the
daemon is significantly slower than "direct" mode:
$ NIX_REMOTE= nix-instantiate '<nixos>' -A system
real 0m0.974s
user 0m0.875s
sys 0m0.088s
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m2.118s
user 0m1.463s
sys 0m0.218s
The main reason seems to be that the client and the worker get moved
to a different CPU after every call to the worker. This patch adds a
hack to lock them to the same CPU. With this, the overhead of going
through the daemon is very small:
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m1.074s
user 0m0.809s
sys 0m0.098s
Commit 20866a7031 added a ‘withAttrs’
field to Env, which is annoying because it makes every Env structure
bigger and we allocate millions of them. E.g. NixOS evaluation took
18 MiB more. So this commit squeezes ‘withAttrs’ into values[0].
Probably should use a union...
Evaluation of attribute sets is strict in the attribute names, which
means immediate evaluation of `with` attribute sets rules out some
potentially interesting use cases (e.g. where the attribute names of one
set depend in some way on another but we want to bring those names into
scope for some values in the second set).
The major example of this is overridable self-referential package sets
(e.g. all-packages.nix). With immediate `with` evaluation, the only
options for such sets are to either make them non-recursive and
explicitly use the name of the overridden set in non-overridden one
every time you want to reference another package, or make the set
recursive and use the `__overrides` hack. As shown in the test case that
comes with this commit, though, delayed `with` evaluation allows a nicer
third alternative.
Signed-off-by: Shea Levy <shea@shealevy.com>
Previously, if the Nix evaluator gets a stack overflow due to a deep
or infinite recursion in the Nix expression, the user gets an
unhelpful message ("Segmentation fault") that doesn't indicate that
the problem is in the user's code rather than Nix itself. Now it
prints:
error: stack overflow (possible infinite recursion)
This only works on x86_64-linux and i686-linux.
Fixes#35.
The kill(2) in Apple's libc follows POSIX semantics, which means that
kill(-1, SIGKILL) will kill the calling process too. Since nix has no
way to distinguish between the process successfully killing everything
and the process being killed by a rogue builder in that case, it can't
safely conclude that killUser was successful.
Luckily, the actual kill syscall takes a parameter that determines
whether POSIX semantics are followed, so we can call that syscall
directly and avoid the issue on Apple.
Signed-off-by: Shea Levy <shea@shealevy.com>
This reverts commit 69b8f9980f.
The timeout should be enforced remotely. Otherwise, if the garbage
collector is running either locally or remotely, if will block the
build or closure copying for some time. If the garbage collector
takes too long, the build may time out, which is not what we want.
Also, on heavily loaded systems, copying large paths to and from the
remote machine can take a long time, also potentially resulting in a
timeout.
mount(2) with MS_BIND allows mounting a regular file on top of a regular
file, so there's no reason to only bind directories. This allows finer
control over just which files are and aren't included in the chroot
without having to build symlink trees or the like.
Signed-off-by: Shea Levy <shea@shealevy.com>
With C++ std::map, doing a comparison like ‘map["foo"] == ...’ has the
side-effect of adding a mapping from "foo" to the empty string if
"foo" doesn't exist in the map. So we ended up setting some
environment variables by accident.
In particular this means that "trivial" derivations such as writeText
are not substituted, reducing the number of GET requests to the binary
cache by about 200 on a typical NixOS configuration.
This substituter basically cannot work reliably since we switched to
SQLite, since SQLite databases may need write access to open them even
just for reading (and in WAL mode they always do).
For instance, it's pointless to keep copy-from-other-stores running if
there are no other stores, or download-using-manifests if there are no
manifests. This also speeds things up because we don't send queries
to those substituters.
Before calling dumpPath(), we have to make sure the files are owned by
the build user. Otherwise, the build could contain a hard link to
(say) /etc/shadow, which would then be read by the daemon and
rewritten as a world-readable file.
This only affects systems that don't have hard link restrictions
enabled.
The assertion in canonicalisePathMetaData() failed because the
ownership of the path already changed due to the hash rewriting. The
solution is not to check the ownership of rewritten paths.
Issue #122.
Otherwise subsequent invocations of "--repair" will keep rebuilding
the path. This only happens if the path content differs between
builds (e.g. due to timestamps).
Functions in Nix are anonymous, but if they're assigned to a
variable/attribute, we can use the variable/attribute name in error
messages, e.g.
while evaluating `concatMapStrings' at `/nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs/pkgs/lib/strings.nix:18:25':
...
Don't pass --timeout / --max-silent-time to the remote builder.
Instead, let the local Nix process terminate the build if it exceeds a
timeout. The remote builder will be killed as a side-effect. This
gives better error reporting (since the timeout message from the
remote side wasn't properly propagated) and handles non-Nix problems
like SSH hangs.
I'm not sure if it has ever worked correctly. The line "lastWait =
after;" seems to mean that the timer was reset every time a build
produced log output.
Note that the timeout is now per build, as documented ("the maximum
number of seconds that a builder can run").
It is surprisingly impossible to check if a mountpoint is a bind mount
on Linux, and in my previous commit I forgot to check if /nix/store was
even a mountpoint at all. statvfs.f_flag is not populated with MS_BIND
(and even if it were, my check was wrong in the previous commit).
Luckily, the semantics of mount with MS_REMOUNT | MS_BIND make both
checks unnecessary: if /nix/store is not a mountpoint, then mount will
fail with EINVAL, and if /nix/store is not a bind-mount, then it will
not be made writable. Thus, if /nix/store is not a mountpoint, we fail
immediately (since we don't know how to make it writable), and if
/nix/store IS a mountpoint but not a bind-mount, we fail at first write
(see below for why we can't check and fail immediately).
Note that, due to what is IMO buggy behavior in Linux, calling mount
with MS_REMOUNT | MS_BIND on a non-bind readonly mount makes the
mountpoint appear writable in two places: In the sixth (but not the
10th!) column of mountinfo, and in the f_flags member of struct statfs.
All other syscalls behave as if the mount point were still readonly (at
least for Linux 3.9-rc1, but I don't think this has changed recently or
is expected to soon). My preferred semantics would be for MS_REMOUNT |
MS_BIND to fail on a non-bind mount, as it doesn't make sense to remount
a non bind-mount as a bind mount.
/nix/store could be a read-only bind mount even if it is / in its own filesystem, so checking the 4th field in mountinfo is insufficient.
Signed-off-by: Shea Levy <shea@shealevy.com>
It turns out that in multi-user Nix, a builder may be able to do
ln /etc/shadow $out/foo
Afterwards, canonicalisePathMetaData() will be applied to $out/foo,
causing /etc/shadow's mode to be set to 444 (readable by everybody but
writable by nobody). That's obviously Very Bad.
Fortunately, this fails in NixOS's default configuration because
/nix/store is a bind mount, so "ln" will fail with "Invalid
cross-device link". It also fails if hard-link restrictions are
enabled, so a workaround is:
echo 1 > /proc/sys/fs/protected_hardlinks
The solution is to check that all files in $out are owned by the build
user. This means that innocuous operations like "ln
${pkgs.foo}/some-file $out/" are now rejected, but that already failed
in chroot builds anyway.
Wacky string coercion semantics caused expressions like
exec = "${./my-script} params...";
to evaluate to a path (‘/path/my-script params’), because
anti-quotations are desuged to string concatenation:
exec = ./my-script + " params...";
By constrast, adding a space at the start would yield a string as
expected:
exec = " ${./my-script} params...";
Now the first example also evaluates to a string.
...where <XX> is the first two characters of the derivation.
Otherwise /nix/var/log/nix/drvs may become so large that we run into
all sorts of weird filesystem limits/inefficiences. For instance,
ext3/ext4 filesystems will barf with "ext4_dx_add_entry:1551:
Directory index full!" once you hit a few million files.
So if a path is not garbage solely because it's reachable from a root
due to the gc-keep-outputs or gc-keep-derivations settings, ‘nix-store
-q --roots’ now shows that root.
But this time it's *obviously* correct! No more segfaults due to
infinite recursions for sure, etc.
Also, move directories to /nix/store/trash instead of renaming them to
/nix/store/bla-gc-<pid>. Then we can just delete /nix/store/trash at
the end.
This prevents zillions of derivations from being kept, and fixes an
infinite recursion in the garbage collector (due to an obscure cycle
that can occur with fixed-output derivations).
We now print all output paths of a package, e.g.
openssl-1.0.0i bin=/nix/store/gq2mvh0wb9l90djvsagln3aqywqmr6vl-openssl-1.0.0i-bin;man=/nix/store/7zwf5r5hsdarl3n86dasvb4chm2xzw9n-openssl-1.0.0i-man;/nix/store/cj7xvk7fjp9q887359j75pw3pzjfmqf1-openssl-1.0.0i
or (in XML mode)
<item attrPath="openssl" name="openssl-1.0.0i" system="x86_64-linux">
<output name="bin" path="/nix/store/gq2mvh0wb9l90djvsagln3aqywqmr6vl-openssl-1.0.0i-bin" />
<output name="man" path="/nix/store/7zwf5r5hsdarl3n86dasvb4chm2xzw9n-openssl-1.0.0i-man" />
<output name="out" path="/nix/store/cj7xvk7fjp9q887359j75pw3pzjfmqf1-openssl-1.0.0i" />
</item>
This allows adding attributes like
attr = if stdenv.system == "bla" then something else null;
without changing the resulting derivation on non-<bla> platforms.
We once considered adding a special "ignore" value for this purpose,
but using null seems more elegant.
The integer constant ‘langVersion’ denotes the current language
version. It gets increased every time a language feature is
added/changed/removed. It's currently 1.
The string constant ‘nixVersion’ contains the current Nix version,
e.g. "1.2pre2980_9de6bc5".
If a derivation has multiple outputs, then we only want to download
those outputs that are actuallty needed. So if we do "nix-build -A
openssl.man", then only the "man" output should be downloaded.
Likewise if another package depends on ${openssl.man}.
The tricky part is that different derivations can depend on different
outputs of a given derivation, so we may need to restart the
corresponding derivation goal if that happens.
For example, given a derivation with outputs "out", "man" and "bin":
$ nix-build -A pkg
produces ./result pointing to the "out" output;
$ nix-build -A pkg.man
produces ./result-man pointing to the "man" output;
$ nix-build -A pkg.all
produces ./result, ./result-man and ./result-bin;
$ nix-build -A pkg.all -A pkg2
produces ./result, ./result-man, ./result-bin and ./result-2.
This flag causes paths that do not have a known substitute to be
quietly ignored. This is mostly useful for Charon, allowing it to
speed up deployment by letting a machine use substitutes for all
substitutable paths, instead of uploading them. The latter is
frequently faster, e.g. if the target machine has a fast Internet
connection while the source machine is on a slow ADSL line.
This reverts commit 2980d1fba9. It
causes a regression in NixOS evaluation:
string `/nix/store/ya3s5gmj3b28170fpbjhgsk8wzymkpa1-pommed-1.39/etc/pommed.conf' cannot refer to other paths
vfork() is just too weird. For instance, in this build:
http://hydra.nixos.org/build/3330487
the value fromHook.writeSide becomes corrupted in the parent, even
though the child only reads from it. At -O0 the problem goes away.
Probably the child is overriding some spilled temporary variable.
If I get bored I may implement using posix_spawn() instead.
I.e. do what git does. I'm too lazy to keep the builtin help text up
to date :-)
Also add ‘--help’ to various commands that lacked it
(e.g. nix-collect-garbage).
With this flag, if any valid derivation output is missing or corrupt,
it will be recreated by using a substitute if available, or by
rebuilding the derivation. The latter may use hash rewriting if
chroots are not available.
This operation allows fixing corrupted or accidentally deleted store
paths by redownloading them using substituters, if available.
Since the corrupted path cannot be replaced atomically, there is a
very small time window (one system call) during which neither the old
(corrupted) nor the new (repaired) contents are available. So
repairing should be used with some care on critical packages like
Glibc.
In Nixpkgs, the attribute in all-packages.nix corresponding to a
package is usually equal to the package name. However, this doesn't
work if the package contains a dash, which is fairly common. The
convention is to replace the dash with an underscore (e.g. "dbus-lib"
becomes "dbus_glib"), but that's annoying. So now dashes are valid in
variable / attribute names, allowing you to write:
dbus-glib = callPackage ../development/libraries/dbus-glib { };
and
buildInputs = [ dbus-glib ];
Since we don't have a negation or subtraction operation in Nix, this
is unambiguous.
Using the immutable bit is problematic, especially in conjunction with
store optimisation. For instance, if the garbage collector deletes a
file, it has to clear its immutable bit, but if the file has
additional hard links, we can't set the bit afterwards because we
don't know the remaining paths.
So now that we support having the entire Nix store as a read-only
mount, we may as well drop the immutable bit. Unfortunately, we have
to keep the code to clear the immutable bit for backwards
compatibility.
It turns out that the immutable bit doesn't work all that well. A
better way is to make the entire Nix store a read-only bind mount,
i.e. by doing
$ mount --bind /nix/store /nix/store
$ mount -o remount,ro,bind /nix/store
(This would typically done in an early boot script, before anything
from /nix/store is used.)
Since Nix needs to be able to write to the Nix store, it now detects
if /nix/store is a read-only bind mount and then makes it writable in
a private mount namespace.
The outputs of a derivation can refer to each other (even though they
cannot have cycles), so they have to be deleted in the right order.
http://hydra.nixos.org/build/3026118