Currently machine specification (`/etc/nix/machine`) parser fails
with a vague exception if the file had incorrect format.
This commit adds verbose exceptions and unit-tests for the parser.
This ensures any started processes can't write to /nix/store (except
during builds). This partially reverts 01d07b1e, which happened because
of #2646.
The problem was only happening after nix downloads anything, causing
me to suspect the download thread. The problem turns out to be:
"A process can't join a new mount namespace if it is sharing
filesystem-related attributes with another process", in this case this
process is the curl thread.
Ideally, we might kill it before spawning the shell process, but it's
inside a static variable in the getFileTransfer() function. So
instead, stop it from sharing FS state using unshare(). A strategy
such as the one from #5057 (single-threaded chroot helper binary) is
also very much on the table.
Fixes#4337.
This fixes a bug in the garbage collector where if a path
/nix/store/abcd-foo is valid, but we do a
isValidPath("/nix/store/abcd-foo.lock") first, then a negative entry
for /nix/store/abcd is added to pathInfoCache, so /nix/store/abcd-foo
is subsequently considered invalid and deleted.
(where "referrers" includes the reverse of derivation outputs and
derivers). Now we do a full traversal to look if we can reach any
root. If not, all paths reached can be deleted.
The garbage collector no longer blocks other processes from
adding/building store paths or adding GC roots. To prevent the
collector from deleting store paths just added by another process,
processes need to connect to the garbage collector via a Unix domain
socket to register new temporary roots.
This reverts some parts of commit
8430a8f086 which was trying to rethrow
some exceptions while we weren’t in the context of a `catch` block,
causing some weird “terminate called without an active exception”
errors.
Fix#5368
In https://github.com/NixOS/nix/pull/5350 we noticed link failures
pkgsStatic.nixUnstable. Adding explicit dependency on libutil fixes
libstore-tests linking.
When I stop a download with Ctrl-C in a `nix repl` of a flake, the REPL
refuses to do any other downloads:
nix-repl> builtins.getFlake "nix-serve"
[0.0 MiB DL] downloading 'https://api.github.com/repos/edolstra/nix-serve/tarball/e9828a9e01a14297d15ca41 error: download of 'e9828a9e01' was interrupted
[0.0 MiB DL]
nix-repl> builtins.getFlake "nix-serve"
error: interrupted by the user
[0.0 MiB DL]
To fix this issue, two changes were necessary:
* Reset the global `_isInterrupted` variable: only because a single
operation was aborted, it should still be possible to continue the
session.
* Recreate a `fileTransfer`-instance if the current one was shut down by
an abort.
9c766a40cb broke logging from the
daemon, because commonChildInit is called when starting the build hook
in a vfork, so it ends up resetting the parent's logger. So don't
vfork.
It might be best to get rid of vfork altogether, but that may cause
problems, e.g. when we call an external program like git from the
evaluator.
Before the changes when building the whole system with
`contentAddressedByDefault = true;` we get many noninformative messages:
$ nix build -f nixos system --keep-going
...
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0'; cross fingers
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0-dev'; cross fingers
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0-python'; cross fingers
error: 2 dependencies of derivation '/nix/store/...-hub-2.14.2.drv' failed to build
warning: rewriting hashes in '/nix/store/...-subversion-1.14.1.drv.chroot/nix/store/...-subversion-1.14.1-dev'; cross fingers
warning: rewriting hashes in '/nix/store/...-subversion-1.14.1.drv.chroot/nix/store/...-subversion-1.14.1-man'; cross fingers
...
Let's downgrade these messages down to debug().
I had started the trend of doing `std::visit` by value (because a type
error once mislead me into thinking that was the only form that
existed). While the optomizer in principle should be able to deal with
extra coppying or extra indirection once the lambdas inlined, sticking
with by reference is the conventional default. I hope this might even
improve performance.
This actually bit me quite recently in `nixpkgs` because I assumed that
`nix-build --check` would also error out if hashes don't match anymore[1]
and so I wrongly assumed that I couldn't reproduce the mismatch error.
The fix is rather simple, during the output registration a so-called
`delayedException` is instantiated e.g. if a FOD hash-mismatch occurs.
However, in case of `nix-build --check` (or `--rebuild` in case of `nix
build`), the code-path where this exception is thrown will never be
reached.
By adding that check to the if-clause that causes an early exit in case
of `bmCheck`, the issue is gone. Also added a (previously failing)
test-case to demonstrate the problem.
[1] https://github.com/NixOS/nixpkgs/pull/139238, the underlying issue
was that `nix-prefetch-git` returns different hashes than `fetchgit`
because the latter one fetches submodules by default.
This is important if the remote side *does* execute
nix-store/nix-daemon successfully, but stdout is polluted
(e.g. because the remote user's bashrc script prints something to
stdout). In that case we have to shutdown the write side to force the
remote nix process to exit.
Instead of
error: serialised integer 7161674624452356180 is too large for type 'j'
we now get
error: 'nix-store --serve' protocol mismatch from 'sshtest@localhost', got 'This account is currently not available.'
Fixes https://github.com/NixOS/nixpkgs/issues/37287.
Before this commit, the dns lookup in preloadNSS would still go through
nscd. This did not have the effect of loading the nss_dns.so as expected
(nss_dns.so being out of reach from within the sandbox).
Should LOCALDOMAIN environment variable be defined, nss will completely
avoid nscd and will do its dns resolution on its own.
By temporarly setting LOCALDOMAIN variable before calling in NSS, we can
force NSS to load the shared libraries as expected.
Fixes#5089
Signed-off-by: Arthur Gautier <baloo@superbaloo.net>
Store paths are only allowed to contain a limited subset of the
alphabet, which doesn’t include `!`. So don’t create lockfiles that
contain this `!` character as that would otherwise confuse (and break)
the gc.
Fix#5176
When doing e.g.
nix-build -A package --keep-failed --option \
builders \
'ssh://mfhydra?remote-store=/home/bosch/store x86_64-linux - 10 4 big-parallel'
this doesn't work properly because this build-setting is ignored.
I changed this behavior by passing the `settings.keepFailed` through the
serve-protocol to remote machines to make sure that I can introspect the
build-directory (which is particularly helpful when I have to look at a
`config.log` from a failed build for instance).
Use `$(libdir)` while installing .pc files looks like a more generic
solution. For example, it will work for distributions like RHEL or
Fedora where .pc files are installed in `/usr/lib64/pkgconfig`.
This replaces the O(n) search complexity in our insert code with a
lookup of O(log n). It also makes removing waitees easier as we can use
the extract method provided by the set class.
Previously the code ensures that the isBase32 array would only be
initialised once in a single-threaded context. If two threads happen to
call the function before the initialisation was completed both of them
would have completed the initialization step. This allowed for a
race-condition where one thread might be done with the initialization
but the other thread sets all the fields to false again. For a brief
moment the base32 detection would then produce false-negatives.
The experimental features are, well, experimental, and shouldn’t be
carelessly and transparently enabled.
Besides, some (`ca-derivations` at least) need to be enabled at startup
in order to work properly.
So it’s better to just require that daemon be started with the right
`experimental-features` option.
Fix#5017
This adds a new store operation 'addMultipleToStore' that reads a
number of NARs and ValidPathInfos from a Source, allowing any number
of store paths to be copied in a single call. This is much faster on
high-latency links when copying a lot of small files, like .drv
closures.
For example, on a connection with an 50 ms delay:
Before:
$ nix copy --to 'unix:///tmp/proxy-socket?root=/tmp/dest-chroot' \
/nix/store/90jjw94xiyg5drj70whm9yll6xjj0ca9-hello-2.10.drv \
--derivation --no-check-sigs
real 0m57.868s
user 0m0.103s
sys 0m0.056s
After:
real 0m0.690s
user 0m0.017s
sys 0m0.011s
With this, we don't have to copy the entire .drv closure to the
destination store ahead of time (or at all). Instead, buildPaths()
reads .drv files from the eval store and copies inputSrcs to the
destination store if it needs to build a derivation.
Issue #5025.
In particular, this now works:
$ nix path-info --eval-store auto --store https://cache.nixos.org nixpkgs#hello
Previously this would fail as it would try to upload the hello .drv to
cache.nixos.org. Now the .drv is instantiated in the local store, and
then we check for the existence of the outputs in cache.nixos.org.
Some people want to avoid using registries at all on their system; Instead
of having to add --no-registries to every command, this commit allows to
set use-registries = false in the config. --no-registries is still allowed
everywhere it was allowed previously, but is now deprecated.
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
- This can legitimately happen (for example because of a non-determinism
causing a build-time dependency to be kept or not as a runtime
reference)
- Because of older Nix versions, it can happen that we encounter a
realisation with an (erroneously) empty set of dependencies, in which
case we don’t want to fail, but just warn the user and try to fix it.
Fill `NIX_CONFIG` with the value of the current Nix configuration before
calling the post-build-hook.
That way the whole configuration (including the possible
`experimental-features`, a possibly `--store` option or whatever) will
be made available to the hook
'--delete-older-than 10' deletes the generations older than a single day, and '--delete-older-than 12m' deletes all generations older than 12 days.
This changes makes it throw on those invalid inputs, and gives an example of a valid input.
Add an access-control list to the realisations in recursive-nix (similar
to the already existing one for store paths), so that we can build
content-addressed derivations in the restricted store.
Fix#4353
Previously, the build system used uname(1) output when it wanted to
check the operating system it was being built for, which meant that it
didn't take into-account cross-compilation when the build and host
operating systems were different.
To fix this, instead of consulting uname output, we consult the host
triple, specifically the third "kernel" part.
For "kernel"s with stable ABIs, like Linux or Cygwin, we can use a
simple ifeq to test whether we're compiling for that system, but for
other platforms, like Darwin, FreeBSD, or Solaris, we have to use a
more complicated check to take into account the version numbers at the
end of the "kernel"s. I couldn't find a way to just strip these
version numbers in GNU Make without shelling out, which would be even
more ugly IMO. Because these checks differ between kernels, and the
patsubst ones are quite fiddly, I've added variables for each host OS
we might want to check to make them easier to reuse.
This way no derivation has to expect that these files are in the `cwd`
during the build. This is problematic for `nix-shell` where these files
would have to be inserted into the nix-shell's `cwd` which can become
problematic with e.g. recursive `nix-shell`.
To remain backwards-compatible, the location inside the build sandbox
will be kept, however using these files directly should be deprecated
from now on.
This is needed to push the adoption of structured attrs[1] forward. It's
now checked if a `__json` exists in the environment-map of the derivation
to be openend in a `nix-shell`.
Derivations with structured attributes enabled also make use of a file
named `.attrs.json` containing every environment variable represented as
JSON which is useful for e.g. `exportReferencesGraph`[2]. To
provide an environment similar to the build sandbox, `nix-shell` now
adds a `.attrs.json` to `cwd` (which is mostly equal to the one in the
build sandbox) and removes it using an exit hook when closing the shell.
To avoid leaking internals of the build-process to the `nix-shell`, the
entire logic to generate JSON and shell code for structured attrs was
moved into the `ParsedDerivation` class.
[1] https://nixos.mayflower.consulting/blog/2020/01/20/structured-attrs/
[2] https://nixos.org/manual/nix/unstable/expressions/advanced-attributes.html#advanced-attributes
Useful when we're using a daemon with a chroot store, e.g.
$ NIX_DAEMON_SOCKET_PATH=/tmp/chroot/nix/var/nix/daemon-socket/socket nix-daemon --store /tmp/chroot
Then the client can now connect with
$ nix build --store unix:///tmp/chroot/nix/var/nix/daemon-socket/socket?root=/tmp/chroot nixpkgs#hello
That doesn’t really make sense with CA derivations (and wasn’t even
really correct before because of FO derivations, though that probably
didn’t matter much in practice)
Make ca-derivations require a `ca-derivations` machine feature, and
ca-aware builders expose it.
That way, a network of builders can mix ca-aware and non-ca-aware
machines, and the scheduler will send them in the right place.
When adding a path to the local store (via `LocalStore::addToStore`),
ensure that the `ca` field of the provided `ValidPathInfo` does indeed
correspond to the content of the path.
Otherwise any untrusted user (or any binary cache) can add arbitrary
content-addressed paths to the store (as content-addressed paths don’t
need a signature).
Linux is (as far as I know) the only mainstream operating system that
requires linking with libdl for dlopen. On BSD, libdl doesn't exist,
so on non-FreeBSD BSDs linking will currently fail. On macOS, it's
apparently just a symlink to libSystem (macOS libc), presumably
present for compatibility with things that assume Linux.
So the right thing to do here is to only add -ldl on Linux, not to add
it for everything that isn't FreeBSD.
Only considers the closure in term of `Realisation`, ignores all the
opaque inputs.
Dunno whether that’s the nicest solution, need to think it through a bit
Align all the worker protocol with `buildDerivation` which inlines the
realisations as one opaque json blob.
That way we don’t have to bother changing the remote store protocol
when the definition of `Realisation` changes, as long as we keep the
json backwards-compatible
Align all the worker protocol with `buildDerivation` which inlines the
realisations as one opaque json blob.
That way we don’t have to bother changing the remote store protocol
when the definition of `Realisation` changes, as long as we keep the
json backwards-compatible
Move the `closure` logic of `computeFSClosure` to its own (templated) function.
This doesn’t bring much by itself (except for the ability to properly
test the “closure” functionality independently from the rest), but it
allows reusing it (in particular for the realisations which will require
a very similar closure computation)
For whatever reason, many programs trying to access SystemVersion.plist
also open SystemVersionCompat.plist; this includes Python code and
coreutils’ `cat(1)` (but not the native macOS `/bin/cat`). Illustratory
`dtruss(1m)` output:
open("/System/Library/CoreServices/SystemVersion.plist\0", 0x0, 0x0) = 3 0
open("/System/Library/CoreServices/SystemVersionCompat.plist\0", 0x0, 0x0) = 4 0
I assume this is a Big Sur change relating to the 10.16.x/11.x
version compatibility divide and that it’s something along the lines of
a hook inside libSystem.
Fixes a lot of sandboxed package builds under Big Sur.
When we don’t have enough free job slots to run a goal, we put it in
the waitForBuildSlot list & unlock its output locks. This will
continue from where we left off (tryLocalBuild). However, we need the
locks to get reacquired when/if the goal ever restarts. So, we need to
send it back through tryToBuild to get reqacquire those locks.
I think this bug was introduced in
https://github.com/NixOS/nix/pull/4570. It leads to some builds
starting without proper locks.
Similar to the nar-info disk cache (and using the same db).
This makes rebuilds muuch faster.
- This works regardless of the ca-derivations experimental feature.
I could modify the logic to not touch the db if the flag isn’t there,
but given that this is a trash-able local cache, it doesn’t seem to be
really worth it.
- We could unify the `NARs` and `Realisation` tables to only have one
generic kv table. This is left as an exercise to the reader.
- I didn’t update the cache db version number as the new schema just
adds a new table to the previous one, so the db will be transparently
migrated and is backwards-compatible.
Fix#4746
I guess I misunderstood John's initial explanation about why wildcards
for outputs are sent to older stores[1]. My `nix-daemon` from 2021-03-26
also has version 1.29, but misses the wildcard[2]. So bumping seems to
be the right call.
[1] https://github.com/NixOS/nix/pull/4759#issuecomment-830812464
[2] 255d145ba7
Starting in macOS 11, the on-disk dylib bundles are no longer available,
but nixpkgs needs to be able to keep compatibility with older versions
that require `/usr/lib/libSystem.B.dylib` in `__impureHostDeps`. Allow
it to keep backwards compatibility with these versions by marking these
dependencies as optional.
Fixes#4658.
They are equivalent according to
<https://spec.commonmark.org/0.29/#hard-line-breaks>,
and the trailing spaces tend to be a pain (because the make git
complain, editors tend to want to remove them − the `.editorconfig`
actually specifies that − etc..).
If there were many top-level goals (which are not destroyed until the
very end), commands like
$ nix copy --to 'ssh://localhost?remote-store=/tmp/nix' \
/run/current-system --no-check-sigs --substitute-on-destination
could fail with "Too many open files". So now we do some explicit
cleanup from amDone(). It would be cleaner to separate goals from
their temporary internal state, but that would be a bigger refactor.
This avoids an ambiguity where the `StorePathWithOutputs { drvPath, {}
}` could mean "build `brvPath`" or "substitute `drvPath`" depending on
context.
It also brings the internals closer in line to the new CLI, by
generalizing the `Buildable` type is used there and makes that
distinction already.
In doing so, relegate `StorePathWithOutputs` to being a type just for
backwards compatibility (CLI and RPC).
These are by no means part of the notion of a store, but rather are
things that happen to use stores. (Or put another way, there's no way
we'd make them virtual methods any time soon.) It's better to move them
out of that too-big class then.
Also, this helps us remove StorePathWithOutputs from the Store interface
altogether next commit.
A few versioning mistakes were corrected:
- In 27b5747ca7, Daemon protocol had some
version `>= 0xc` that should have been `>= 0x1c`, or `28` since the
other conditions used decimal.
- In a2b69660a9, legacy SSH gated new CAS
info on version 6, but version 5 in the server. It is now 6
everywhere.
Additionally, legacy ssh was sending over more metadata than the daemon
one was. The daemon now sends that data too.
CC @regnat
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
I guess the rationale behind the old name wath that
`pathInfoIsTrusted(info)` returns `true` iff we would need to `blindly`
trust the path (because it has no valid signature and `requireSigs` is
set), but I find it to be a really confusing footgun because it's quite
natural to give it the opposite meaning.
What happened was that Nix was trying to unconditionally mount these
paths in fixed-output derivations, but since the outer derivation was
pure, those paths did not exist. The solution is to only mount those
paths when they exist.
This separates the scheduling logic (including simple hook pathway) from
the local-store needing code.
This should be the final split for now. I'm reasonably happy with how
it's turning out, even before I'm done moving code into
`local-derivation-goal`. Benefits:
1. This will help "witness" that the hook case is indeed a lot simpler,
and also compensate for the increased complexity that comes from
content-addressed derivation outputs.
2. It also moves us ever so slightly towards a world where we could use
off-the-shelf storage or sandboxing, since `local-derivation-goal`
would be gutted in those cases, but `derivation-goal` should remain
nearly the same.
The new `#if 0` in the new files will be deleted in the following
commit. I keep it here so if it turns out more stuff can be moved over,
it's easy to do so in a way that preserves ordering --- and thus
prevents conflicts.
N.B.
```sh
git diff HEAD^^ --color-moved --find-copies-harder --patience --stat
```
makes nicer output.
This is already used by Hydra, and is very useful when materializing
a remote builder list from service discovery. This allows the service
discovery tool to only sync one file instead of two.
This is technically a breaking change, since attempting to set plugin
files after the first non-flag argument will now throw an error. This
is acceptable given the relative lack of stability in a plugin
interface and the need to tie the knot somewhere once plugins can
actually define new subcommands.
This field used to be a `BasicDerivation`, but this `BasicDerivation`
was downcasted to a `Derivation` when needed (implicitely or not), so we
might as well make it a full `Derivation` and upcast it when needed.
This also allows getting rid of a weird duplication in the way we
compute the static output hashes for the derivation. We had to
do it differently and in a different place depending on whether the
derivation was a full derivation or just a basic drv, but we can now do
it unconditionally on the full derivation.
Fix#4559
- Pass it the name of the outputs rather than their output paths (as
these don't exist for ca derivations)
- Get the built output paths from the remote builder
- Register the new received realisations
When performing distributed builds of machine learning packages, it
would be nice if builders without the required SIMD instructions can
be excluded as build nodes.
Since x86_64 has accumulated a large number of different instruction
set extensions, listing all possible extensions would be unwieldy.
AMD, Intel, Red Hat, and SUSE have recently defined four different
microarchitecture levels that are now part of the x86-64 psABI
supplement and will be used in glibc 2.33:
https://gitlab.com/x86-psABIs/x86-64-ABIhttps://lwn.net/Articles/844831/
This change uses libcpuid to detect CPU features and then uses them to
add the supported x86_64 levels to the additional system types. For
example on a Ryzen 3700X:
$ ~/aps/bin/nix -vv --version | grep "Additional system"
Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux