Ludo reported this error:
unexpected Nix daemon error: boost::too_few_args: format-string refered to more arguments than were passed
coming from this line:
printMsg(lvlError, run.program + ": " + string(err, 0, p));
The problem here is that the string ends up implicitly converted to a
Boost format() object, so % characters are treated specially. I
always assumed (wrongly) that strings are converted to a format object
that outputs the string as-is.
Since this assumption appears in several places that may be hard to
grep for, I've added some C++ type hackery to ensures that the right
thing happens. So you don't have to worry about % in statements like
printMsg(lvlError, "foo: " + s);
or
throw Error("foo: " + s);
The daemon now creates /dev deterministically (thanks!). However, it
expects /dev/kvm to be present.
The patch below restricts that requirement (1) to Linux-based systems,
and (2) to systems where /dev/kvm already exists.
I’m not sure about the way to handle (2). We could special-case
/dev/kvm and create it (instead of bind-mounting it) in the chroot, so
it’s always available; however, it wouldn’t help much since most likely,
if /dev/kvm missing, then KVM support is missing.
Currently, clients cannot recover from an isValidPath RPC with an
invalid path parameter because the daemon closes the connection when
that happens.
More precisely:
1. in performOp, wopIsValidPath case, ‘readStorePath’ raises an
‘Error’ exception;
2. that exception is caught by the handler in ‘processConnection’;
3. the handler determines errorAllowed == false, and thus exits after
sending the message.
This last part is fixed by calling ‘startWork’ early on, as in the patch
below.
The same reasoning could be applied to all the RPCs that take one or
more store paths as inputs, but isValidPath is, by definition, likely to
be passed invalid paths in the first place, so it’s important for this
one to allow recovery.
Since the meta attributes were not sorted, attribute lookup could
fail, leading to package priorities and active flags not working
correctly.
Broken since 0f24400d90.
If we're evaluating some application ‘v = f x’, we can't store ‘f’
temporarily in ‘v’, because if ‘f x’ refers to ‘v’, it will get ‘f’
rather than an infinite recursion error.
Unfortunately, this breaks the tail call optimisation introduced in
c897bac549.
Fixes#217.
We were relying on SubstitutionGoal's destructor releasing the lock,
but if a goal is a top-level goal, the destructor won't run in a
timely manner since its reference count won't drop to zero. So
release it explicitly.
Fixes#178.
Fixes#121. Note that we don't warn about missing $NIX_PATH entries
because it's intended that some may be missing (cf. the default
$NIX_PATH on NixOS, which includes paths like /etc/nixos/nixpkgs for
backward compatibility).
The flag ‘--check’ to ‘nix-store -r’ or ‘nix-build’ will cause Nix to
redo the build of a derivation whose output paths are already valid.
If the new output differs from the original output, an error is
printed. This makes it easier to test if a build is deterministic.
(Obviously this cannot catch all sources of non-determinism, but it
catches the most common one, namely the current time.)
For example:
$ nix-build '<nixpkgs>' -A patchelf
...
$ nix-build '<nixpkgs>' -A patchelf --check
error: derivation `/nix/store/1ipvxsdnbhl1rw6siz6x92s7sc8nwkkb-patchelf-0.6' may not be deterministic: hash mismatch in output `/nix/store/4pc1dmw5xkwmc6q3gdc9i5nbjl4dkjpp-patchelf-0.6.drv'
The --check build fails if not all outputs are valid. Thus the first
call to nix-build is necessary to ensure that all outputs are valid.
The current outputs are left untouched: the new outputs are either put
in a chroot or diverted to a different location in the store using
hash rewriting.
This substituter connects to a remote host, runs nix-store --serve
there, and then forwards substituter commands on to the remote host and
sends their results to the calling program. The ssh-substituter-hosts
option can be specified as a list of hosts to try.
This is an initial implementation and, while it works, it has some
limitations:
* Only the first host is used
* There is no caching of query results (all queries are sent to the
remote machine)
* There is no informative output (such as progress bars)
* Some failure modes may cause unhelpful error messages
* There is no concept of trusted-ssh-substituter-hosts
Signed-off-by: Shea Levy <shea@shealevy.com>
nix-store --export takes a tmproot, which can only release by exiting.
Substituters don't currently work in a way that could take advantage of
the looping, anyway.
Signed-off-by: Shea Levy <shea@shealevy.com>
This is essentially the substituter API operating on the local store,
which will be used by the ssh substituter. It runs in a loop rather than
just taking one command so that in the future nix will be able to keep
one connection open for multiple instances of the substituter.
Signed-off-by: Shea Levy <shea@shealevy.com>
This allows running nix-instantiate --eval-only without performing the
evaluation in readonly mode, letting features like import from
derivation and automatic substitution of builtins.storePath paths work.
Signed-off-by: Shea Levy <shea@shealevy.com>
Namely:
nix-store: derivations.cc:242: nix::Hash nix::hashDerivationModulo(nix::StoreAPI&, nix::Derivation): Assertion `store.isValidPath(i->first)' failed.
This happened because of the derivation output correctness check being
applied before the references of a derivation are valid.
Now, in addition to a."${b}".c, you can write a.${b}.c (applicable
wherever dynamic attributes are valid).
Signed-off-by: Shea Levy <shea@shealevy.com>
*headdesk*
*headdesk*
*headdesk*
So since commit 22144afa8d, Nix hasn't
actually checked whether the content of a downloaded NAR matches the
hash specified in the manifest / NAR info file. Urghhh...
This doesn't change any functionality but moves some behavior out of the
parser and into the evaluator in order to simplify the code.
Signed-off-by: Shea Levy <shea@shealevy.com>
Since addAttr has to iterate through the AttrPath we pass it, it makes
more sense to just iterate through the AttrNames in addAttr instead. As
an added bonus, this allows attrsets where two dynamic attribute paths
have the same static leading part (see added test case for an example
that failed previously).
Signed-off-by: Shea Levy <shea@shealevy.com>
This adds new syntax for attribute names:
* attrs."${name}" => getAttr name attrs
* attrs ? "${name}" => isAttrs attrs && hasAttr attrs name
* attrs."${name}" or def => if attrs ? "${name}" then attrs."${name}" else def
* { "${name}" = value; } => listToAttrs [{ inherit name value; }]
Of course, it's a bit more complicated than that. The attribute chains
can be arbitrarily long and contain combinations of static and dynamic
parts (e.g. attrs."${foo}".bar."${baz}" or qux), which is relatively
straightforward for the getAttrs/hasAttrs cases but is more complex for
the listToAttrs case due to rules about duplicate attribute definitions.
For attribute sets with dynamic attribute names, duplicate static
attributes are detected at parse time while duplicate dynamic attributes
are detected when the attribute set is forced. So, for example, { a =
null; a.b = null; "${"c"}" = true; } will be a parse-time error, while
{ a = {}; "${"a"}".b = null; c = true; } will be an eval-time error
(technically that case could theoretically be detected at parse time,
but the general case would require full evaluation). Moreover, duplicate
dynamic attributes are not allowed even in cases where they would be
with static attributes ({ a.b.d = true; a.b.c = false; } is legal, but {
a."${"b"}".d = true; a."${"b"}".c = false; } is not). This restriction
might be relaxed in the future in cases where the static variant would
not be an error, but it is not obvious that that is desirable.
Finally, recursive attribute sets with dynamic attributes have the
static attributes in scope but not the dynamic ones. So rec { a = true;
"${"b"}" = a; } is equivalent to { a = true; b = true; } but rec {
"${"a"}" = true; b = a; } would be an error or use a from the
surrounding scope if it exists.
Note that the getAttr, getAttr or default, and hasAttr are all
implemented purely in the parser as syntactic sugar, while attribute
sets with dynamic attribute names required changes to the AST to be
implemented cleanly.
This is an alternative solution to and closes#167
Signed-off-by: Shea Levy <shea@shealevy.com>
Certain desugaring schemes may require the parser to use some builtin
function to do some of the work (e.g. currently `throw` is used to
lazily cause an error if a `<>`-style path is not in the search path)
Unfortunately, these names are not reserved keywords, so an expression
that uses such a syntactic sugar will not see the expected behavior
(see tests/lang/eval-okay-redefine-builtin.nix for an example).
This adds the ExprBuiltin AST type, which when evaluated uses the value
from the rootmost variable scope (which of course is initialized
internally and can't shadow any of the builtins).
Signed-off-by: Shea Levy <shea@shealevy.com>
This will allow e.g. channel expressions to use builtins.storePath IFF
it is safe to do so without knowing if the path is valid yet.
Signed-off-by: Shea Levy <shea@shealevy.com>
In particular "libutil" was always a problem because it collides with
Glibc's libutil. Even if we install into $(libdir)/nix, the linker
sometimes got confused (e.g. if a program links against libstore but
not libutil, then ld would report undefined symbols in libstore
because it was looking at Glibc's libutil).
This is requires if you have attribute names with dots in them. So
you can now say:
$ nix-instantiate '<nixos>' -A 'config.systemd.units."postgresql.service".text' --eval-only
Fixes#151.
Note that adding --show-trace prevents functions calls from being
tail-recursive, so an expression that evaluates without --show-trace
may fail with a stack overflow if --show-trace is given.
It kept temporary data in STL containers that were not scanned by
Boehm GC, so Nix programs using genericClosure could randomly crash if
the garbage collector kicked in at a bad time.
Also make it a bit more efficient by copying points to values rather
than values.
We already have some primops for determining the type of a value, such
as isString, but they're incomplete: for instance, there is no isPath.
Rather than adding more isBla functions, the generic typeOf function
returns a string representing the type of the argument (e.g. "int").
I.e. "nix-store -q --roots" will now show (for example)
/home/eelco/Dev/nixpkgs/result
rather than
/nix/var/nix/gcroots/auto/53222qsppi12s2hkap8dm2lg8xhhyk6v
Combined with the previous changes, stack traces involving derivations
are now much less verbose, since something like
while evaluating the builtin function `getAttr':
while evaluating the builtin function `derivationStrict':
while instantiating the derivation named `gtk+-2.24.20' at `/home/eelco/Dev/nixpkgs/pkgs/development/libraries/gtk+/2.x.nix:11:3':
while evaluating the derivation attribute `propagatedNativeBuildInputs' at `/home/eelco/Dev/nixpkgs/pkgs/stdenv/generic/default.nix:78:17':
while evaluating the attribute `outPath' at `/nix/store/212ngf4ph63mp6p1np2bapkfikpakfv7-nix-1.6/share/nix/corepkgs/derivation.nix:18:9':
...
now reads
while evaluating the attribute `propagatedNativeBuildInputs' of the derivation `gtk+-2.24.20' at `/home/eelco/Dev/nixpkgs/pkgs/development/libraries/gtk+/2.x.nix:11:3':
...
Messages like
while evaluating the attribute `outPath' at `/nix/store/212ngf4ph63mp6p1np2bapkfikpakfv7-nix-1.6/share/nix/corepkgs/derivation.nix:18:9':
are redundant, because Nix already shows that it's evaluating a derivation:
while instantiating the derivation named `firefox-24.0' at `/home/eelco/Dev/nixpkgs/pkgs/applications/networking/browsers/firefox/default.nix:131:5':
while evaluating the derivation attribute `nativeBuildInputs' at `/home/eelco/Dev/nixpkgs/pkgs/stdenv/generic/default.nix:76:17':
Commit 159e621d1a accidentally changed
the behaviour of antiquoted paths, e.g.
"${/foo}/bar"
used to evaluate to "/nix/store/<hash>-foo/bar" (where /foo gets
copied to the store), but in Nix 1.6 it evaluates to "/foo/bar". This
is inconsistent, since
" ${/foo}/bar"
evaluates to " /nix/store/<hash>-foo/bar". So revert to the old
behaviour.
There is no risk of getting an inconsistent result here: if the ID
returned by queryValidPathId() is deleted from the database
concurrently, subsequent queries involving that ID will simply fail
(since IDs are never reused).
In the Hydra build farm we fairly regularly get SQLITE_PROTOCOL errors
(e.g., "querying path in database: locking protocol"). The docs for
this error code say that it "is returned if some other process is
messing with file locks and has violated the file locking protocol
that SQLite uses on its rollback journal files." However, the SQLite
source code reveals that this error can also occur under high load:
if( cnt>5 ){
int nDelay = 1; /* Pause time in microseconds */
if( cnt>100 ){
VVA_ONLY( pWal->lockError = 1; )
return SQLITE_PROTOCOL;
}
if( cnt>=10 ) nDelay = (cnt-9)*238; /* Max delay 21ms. Total delay 996ms */
sqlite3OsSleep(pWal->pVfs, nDelay);
}
i.e. if certain locks cannot be not acquired, SQLite will retry a
number of times before giving up and returing SQLITE_PROTOCOL. The
comments say:
Circumstances that cause a RETRY should only last for the briefest
instances of time. No I/O or other system calls are done while the
locks are held, so the locks should not be held for very long. But
if we are unlucky, another process that is holding a lock might get
paged out or take a page-fault that is time-consuming to resolve,
during the few nanoseconds that it is holding the lock. In that case,
it might take longer than normal for the lock to free.
...
The total delay time before giving up is less than 1 second.
On a heavily loaded machine like lucifer (the main Hydra server),
which often has dozens of processes waiting for I/O, it seems to me
that a page fault could easily take more than a second to resolve.
So, let's treat SQLITE_PROTOCOL as SQLITE_BUSY and retry the
transaction.
Issue NixOS/hydra#14.
Previously, a undefined variable inside a "with" caused an EvalError
(which can be caught), while outside, it caused a ParseError (which
cannot be caught). Now both cause an UndefinedVarError (which cannot
be caught).
Since they don't have location information, they just give you crap
like:
while evaluating the builtin function `getAttr':
while evaluating the builtin function `derivationStrict':
...
If a "with" attribute set fails to evaluate, we have to make sure its
Env record remains unchanged. Otherwise, repeated evaluation gives a
segfault:
nix-repl> :a with 0; { a = x; b = x; }
Added 2 variables.
nix-repl> a
error: value is an integer while an attribute set was expected
nix-repl> b
Segmentation fault
As discovered by Todd Veldhuizen, the shell started by nix-shell has
its affinity set to a single CPU. This is because nix-shell connects
to the Nix daemon, which causes the affinity hack to be applied. So
we turn this off for Perl programs.
This is equivalent to running ‘nix-env -e '*'’ first, except that it
happens in a single transaction. Thus, ‘nix-env -i pkgs...’ replaces
the profile with the specified set of packages.
The main motivation is to support declarative package management
(similar to environment.systemPackages in NixOS). That is, if you
have a specification ‘profile.nix’ like this:
with import <nixpkgs> {};
[ thunderbird
geeqie
...
]
then after any change to ‘profile.nix’, you can run:
$ nix-env -f profile.nix -ir
to update the profile to match the specification. (Without the ‘-r’
flag, if you remove a package from ‘profile.nix’, it won't be removed
from the actual profile.)
Suggested by @zefhemel.
This prevents some duplicate evaluation in nix-env and
nix-instantiate.
Also, when traversing ~/.nix-defexpr, only read regular files with the
extension .nix. Previously it was reading files like
.../channels/binary-caches/<name>. The only reason this didn't cause
problems is pure luck (namely, <name> shadows an actual Nix
expression, the binary-caches files happen to be syntactically valid
Nix expressions, and we iterate over the directory contents in just
the right order).
Since we already cache files in normal form (fileEvalCache), caching
parse trees is redundant.
Note that getting rid of this cache doesn't actually save much memory
at the moment, because parse trees are currently not freed / GC'ed.
This reduces the difference between inherited and non-inherited
attribute handling to the choice of which env to use (in recs and lets)
by setting the AttrDef::e to a new ExprVar in the parser rather than
carrying a separate AttrDef::v VarRef member.
As an added bonus, this allows inherited attributes that inherit from a
with to delay forcing evaluation of the with's attributes.
Signed-off-by: Shea Levy <shea@shealevy.com>
On Linux, Nix can build i686 packages even on x86_64 systems. It's not
enough to recognize this situation by settings.thisSystem, we also have
to consult uname(). E.g. we can be running on a i686 Debian with an
amd64 kernel. In that situation settings.thisSystem is i686-linux, but
we still need to change personality to i686 to make builds consistent.
On a system with multiple CPUs, running Nix operations through the
daemon is significantly slower than "direct" mode:
$ NIX_REMOTE= nix-instantiate '<nixos>' -A system
real 0m0.974s
user 0m0.875s
sys 0m0.088s
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m2.118s
user 0m1.463s
sys 0m0.218s
The main reason seems to be that the client and the worker get moved
to a different CPU after every call to the worker. This patch adds a
hack to lock them to the same CPU. With this, the overhead of going
through the daemon is very small:
$ NIX_REMOTE=daemon nix-instantiate '<nixos>' -A system
real 0m1.074s
user 0m0.809s
sys 0m0.098s
Commit 20866a7031 added a ‘withAttrs’
field to Env, which is annoying because it makes every Env structure
bigger and we allocate millions of them. E.g. NixOS evaluation took
18 MiB more. So this commit squeezes ‘withAttrs’ into values[0].
Probably should use a union...
Evaluation of attribute sets is strict in the attribute names, which
means immediate evaluation of `with` attribute sets rules out some
potentially interesting use cases (e.g. where the attribute names of one
set depend in some way on another but we want to bring those names into
scope for some values in the second set).
The major example of this is overridable self-referential package sets
(e.g. all-packages.nix). With immediate `with` evaluation, the only
options for such sets are to either make them non-recursive and
explicitly use the name of the overridden set in non-overridden one
every time you want to reference another package, or make the set
recursive and use the `__overrides` hack. As shown in the test case that
comes with this commit, though, delayed `with` evaluation allows a nicer
third alternative.
Signed-off-by: Shea Levy <shea@shealevy.com>
Previously, if the Nix evaluator gets a stack overflow due to a deep
or infinite recursion in the Nix expression, the user gets an
unhelpful message ("Segmentation fault") that doesn't indicate that
the problem is in the user's code rather than Nix itself. Now it
prints:
error: stack overflow (possible infinite recursion)
This only works on x86_64-linux and i686-linux.
Fixes#35.
The kill(2) in Apple's libc follows POSIX semantics, which means that
kill(-1, SIGKILL) will kill the calling process too. Since nix has no
way to distinguish between the process successfully killing everything
and the process being killed by a rogue builder in that case, it can't
safely conclude that killUser was successful.
Luckily, the actual kill syscall takes a parameter that determines
whether POSIX semantics are followed, so we can call that syscall
directly and avoid the issue on Apple.
Signed-off-by: Shea Levy <shea@shealevy.com>
This reverts commit 69b8f9980f.
The timeout should be enforced remotely. Otherwise, if the garbage
collector is running either locally or remotely, if will block the
build or closure copying for some time. If the garbage collector
takes too long, the build may time out, which is not what we want.
Also, on heavily loaded systems, copying large paths to and from the
remote machine can take a long time, also potentially resulting in a
timeout.
mount(2) with MS_BIND allows mounting a regular file on top of a regular
file, so there's no reason to only bind directories. This allows finer
control over just which files are and aren't included in the chroot
without having to build symlink trees or the like.
Signed-off-by: Shea Levy <shea@shealevy.com>
With C++ std::map, doing a comparison like ‘map["foo"] == ...’ has the
side-effect of adding a mapping from "foo" to the empty string if
"foo" doesn't exist in the map. So we ended up setting some
environment variables by accident.
In particular this means that "trivial" derivations such as writeText
are not substituted, reducing the number of GET requests to the binary
cache by about 200 on a typical NixOS configuration.
This substituter basically cannot work reliably since we switched to
SQLite, since SQLite databases may need write access to open them even
just for reading (and in WAL mode they always do).
For instance, it's pointless to keep copy-from-other-stores running if
there are no other stores, or download-using-manifests if there are no
manifests. This also speeds things up because we don't send queries
to those substituters.
Before calling dumpPath(), we have to make sure the files are owned by
the build user. Otherwise, the build could contain a hard link to
(say) /etc/shadow, which would then be read by the daemon and
rewritten as a world-readable file.
This only affects systems that don't have hard link restrictions
enabled.
The assertion in canonicalisePathMetaData() failed because the
ownership of the path already changed due to the hash rewriting. The
solution is not to check the ownership of rewritten paths.
Issue #122.