For a typical desktop system (~2K packages) we can easily get 100K
entries in RealisationsRefs. Without indices query for RealisationsRefs
requires linear scan.
RealisationsRefs(referrer)
--------------------------
Inefficiency is seen as a 100% CPU load of nix-daemon for the following
scenario:
$ nix edit -f . bash # add unused environment variable, like FOO="1"
# populate RealisationsRefs, build fresh system
$ nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
$ nix edit -f . bash # add unused environment variable, like FOO="2"
$ time nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
In this case `bash `will be rebuilt a few times and then rest of CPU
time is spent on scanning RealisationsRefs table (about 5 CPU-minutes
on my machine).
Before the change:
$ time nix build -f nixos system ... # step 4 above
real 34m3,613s
user 0m5,232s
sys 0m0,758s
Of all this time about 29.5 minutes are taken by nix-daemon's CPU time.
After the change:
$ time nix build -f nixos system ... # step 4 above
real 4m50,061s
user 0m5,038s
sys 0m0,677s
Of all this time about 1 minute is taken by nix-daemon's CPU time.
Most of the time is spent polling for non-existent realisations on
cache-nixos.org.
Realisations(outputPath)
------------------------
After running CA system for two weeks I got ~1M entries in Realisations
table. `nix-collect-garbage` became very slow (seemingly 100 path deletions
per second). It happens due to a slow cascading delete from Realisations
triggered by deletion from ValidPaths.
The fix is to add an index on primary key from ValidPaths(id) that
triggers cascading deletions.
Before the change:
$ time nix-collect-garbage -d --max-freed 100G
<interrupted before finish, took too long>
real 23m32.411s
user 17m49.679s
sys 4m50.609s
Most of time was spent in re-scanning Realisations table on each path deletion.
After the change:
$ time nix-collect-garbage -d --max-freed 100G
real 8m43.226s
user 6m16.317s
sys 1m40.188s
Time is spent scanning sqlite indices and in kernel when unlinking directories.
Doing it as a side-effect of calling LocalStore::makeStoreWritable()
is very ugly.
Also, make sure that stopping the progress bar joins the update
thread, otherwise that thread should be unshared as well.
Since 4806f2f6b0, we can't have paths with
references passed to builtins.{path,filterSource}. This prevents many cases
of those functions called on IFD outputs from working. Resolve this by
passing the references found in the original path to the added path.
When setting flake-local options (with the `nixConfig` field), forward
these options to the daemon in case we’re using one.
This is necessary in particular for options like `binary-caches` or
`post-build-hook` to make sense.
Fix <343239fc8a (r44356843)>
Rather than having them plain strings scattered through the whole
codebase, create an enum containing all the known experimental features.
This means that
- Nix can now `warn` when an unkwown experimental feature is passed
(making it much nicer to spot typos and spot deprecated features)
- It’s now easy to remove a feature altogether (once the feature isn’t
experimental anymore or is dropped) by just removing the field for the
enum and letting the compiler point us to all the now invalid usages
of it.
Currently machine specification (`/etc/nix/machine`) parser fails
with a vague exception if the file had incorrect format.
This commit adds verbose exceptions and unit-tests for the parser.
This ensures any started processes can't write to /nix/store (except
during builds). This partially reverts 01d07b1e, which happened because
of #2646.
The problem was only happening after nix downloads anything, causing
me to suspect the download thread. The problem turns out to be:
"A process can't join a new mount namespace if it is sharing
filesystem-related attributes with another process", in this case this
process is the curl thread.
Ideally, we might kill it before spawning the shell process, but it's
inside a static variable in the getFileTransfer() function. So
instead, stop it from sharing FS state using unshare(). A strategy
such as the one from #5057 (single-threaded chroot helper binary) is
also very much on the table.
Fixes#4337.
This fixes a bug in the garbage collector where if a path
/nix/store/abcd-foo is valid, but we do a
isValidPath("/nix/store/abcd-foo.lock") first, then a negative entry
for /nix/store/abcd is added to pathInfoCache, so /nix/store/abcd-foo
is subsequently considered invalid and deleted.
(where "referrers" includes the reverse of derivation outputs and
derivers). Now we do a full traversal to look if we can reach any
root. If not, all paths reached can be deleted.
The garbage collector no longer blocks other processes from
adding/building store paths or adding GC roots. To prevent the
collector from deleting store paths just added by another process,
processes need to connect to the garbage collector via a Unix domain
socket to register new temporary roots.
This reverts some parts of commit
8430a8f086 which was trying to rethrow
some exceptions while we weren’t in the context of a `catch` block,
causing some weird “terminate called without an active exception”
errors.
Fix#5368
In https://github.com/NixOS/nix/pull/5350 we noticed link failures
pkgsStatic.nixUnstable. Adding explicit dependency on libutil fixes
libstore-tests linking.
When I stop a download with Ctrl-C in a `nix repl` of a flake, the REPL
refuses to do any other downloads:
nix-repl> builtins.getFlake "nix-serve"
[0.0 MiB DL] downloading 'https://api.github.com/repos/edolstra/nix-serve/tarball/e9828a9e01a14297d15ca41 error: download of 'e9828a9e01' was interrupted
[0.0 MiB DL]
nix-repl> builtins.getFlake "nix-serve"
error: interrupted by the user
[0.0 MiB DL]
To fix this issue, two changes were necessary:
* Reset the global `_isInterrupted` variable: only because a single
operation was aborted, it should still be possible to continue the
session.
* Recreate a `fileTransfer`-instance if the current one was shut down by
an abort.
9c766a40cb broke logging from the
daemon, because commonChildInit is called when starting the build hook
in a vfork, so it ends up resetting the parent's logger. So don't
vfork.
It might be best to get rid of vfork altogether, but that may cause
problems, e.g. when we call an external program like git from the
evaluator.
Before the changes when building the whole system with
`contentAddressedByDefault = true;` we get many noninformative messages:
$ nix build -f nixos system --keep-going
...
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0'; cross fingers
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0-dev'; cross fingers
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0-python'; cross fingers
error: 2 dependencies of derivation '/nix/store/...-hub-2.14.2.drv' failed to build
warning: rewriting hashes in '/nix/store/...-subversion-1.14.1.drv.chroot/nix/store/...-subversion-1.14.1-dev'; cross fingers
warning: rewriting hashes in '/nix/store/...-subversion-1.14.1.drv.chroot/nix/store/...-subversion-1.14.1-man'; cross fingers
...
Let's downgrade these messages down to debug().
I had started the trend of doing `std::visit` by value (because a type
error once mislead me into thinking that was the only form that
existed). While the optomizer in principle should be able to deal with
extra coppying or extra indirection once the lambdas inlined, sticking
with by reference is the conventional default. I hope this might even
improve performance.
This actually bit me quite recently in `nixpkgs` because I assumed that
`nix-build --check` would also error out if hashes don't match anymore[1]
and so I wrongly assumed that I couldn't reproduce the mismatch error.
The fix is rather simple, during the output registration a so-called
`delayedException` is instantiated e.g. if a FOD hash-mismatch occurs.
However, in case of `nix-build --check` (or `--rebuild` in case of `nix
build`), the code-path where this exception is thrown will never be
reached.
By adding that check to the if-clause that causes an early exit in case
of `bmCheck`, the issue is gone. Also added a (previously failing)
test-case to demonstrate the problem.
[1] https://github.com/NixOS/nixpkgs/pull/139238, the underlying issue
was that `nix-prefetch-git` returns different hashes than `fetchgit`
because the latter one fetches submodules by default.
This is important if the remote side *does* execute
nix-store/nix-daemon successfully, but stdout is polluted
(e.g. because the remote user's bashrc script prints something to
stdout). In that case we have to shutdown the write side to force the
remote nix process to exit.
Instead of
error: serialised integer 7161674624452356180 is too large for type 'j'
we now get
error: 'nix-store --serve' protocol mismatch from 'sshtest@localhost', got 'This account is currently not available.'
Fixes https://github.com/NixOS/nixpkgs/issues/37287.
Before this commit, the dns lookup in preloadNSS would still go through
nscd. This did not have the effect of loading the nss_dns.so as expected
(nss_dns.so being out of reach from within the sandbox).
Should LOCALDOMAIN environment variable be defined, nss will completely
avoid nscd and will do its dns resolution on its own.
By temporarly setting LOCALDOMAIN variable before calling in NSS, we can
force NSS to load the shared libraries as expected.
Fixes#5089
Signed-off-by: Arthur Gautier <baloo@superbaloo.net>
Store paths are only allowed to contain a limited subset of the
alphabet, which doesn’t include `!`. So don’t create lockfiles that
contain this `!` character as that would otherwise confuse (and break)
the gc.
Fix#5176
When doing e.g.
nix-build -A package --keep-failed --option \
builders \
'ssh://mfhydra?remote-store=/home/bosch/store x86_64-linux - 10 4 big-parallel'
this doesn't work properly because this build-setting is ignored.
I changed this behavior by passing the `settings.keepFailed` through the
serve-protocol to remote machines to make sure that I can introspect the
build-directory (which is particularly helpful when I have to look at a
`config.log` from a failed build for instance).
Use `$(libdir)` while installing .pc files looks like a more generic
solution. For example, it will work for distributions like RHEL or
Fedora where .pc files are installed in `/usr/lib64/pkgconfig`.
This replaces the O(n) search complexity in our insert code with a
lookup of O(log n). It also makes removing waitees easier as we can use
the extract method provided by the set class.
Previously the code ensures that the isBase32 array would only be
initialised once in a single-threaded context. If two threads happen to
call the function before the initialisation was completed both of them
would have completed the initialization step. This allowed for a
race-condition where one thread might be done with the initialization
but the other thread sets all the fields to false again. For a brief
moment the base32 detection would then produce false-negatives.
The experimental features are, well, experimental, and shouldn’t be
carelessly and transparently enabled.
Besides, some (`ca-derivations` at least) need to be enabled at startup
in order to work properly.
So it’s better to just require that daemon be started with the right
`experimental-features` option.
Fix#5017
This adds a new store operation 'addMultipleToStore' that reads a
number of NARs and ValidPathInfos from a Source, allowing any number
of store paths to be copied in a single call. This is much faster on
high-latency links when copying a lot of small files, like .drv
closures.
For example, on a connection with an 50 ms delay:
Before:
$ nix copy --to 'unix:///tmp/proxy-socket?root=/tmp/dest-chroot' \
/nix/store/90jjw94xiyg5drj70whm9yll6xjj0ca9-hello-2.10.drv \
--derivation --no-check-sigs
real 0m57.868s
user 0m0.103s
sys 0m0.056s
After:
real 0m0.690s
user 0m0.017s
sys 0m0.011s
With this, we don't have to copy the entire .drv closure to the
destination store ahead of time (or at all). Instead, buildPaths()
reads .drv files from the eval store and copies inputSrcs to the
destination store if it needs to build a derivation.
Issue #5025.
In particular, this now works:
$ nix path-info --eval-store auto --store https://cache.nixos.org nixpkgs#hello
Previously this would fail as it would try to upload the hello .drv to
cache.nixos.org. Now the .drv is instantiated in the local store, and
then we check for the existence of the outputs in cache.nixos.org.
Some people want to avoid using registries at all on their system; Instead
of having to add --no-registries to every command, this commit allows to
set use-registries = false in the config. --no-registries is still allowed
everywhere it was allowed previously, but is now deprecated.
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
- This can legitimately happen (for example because of a non-determinism
causing a build-time dependency to be kept or not as a runtime
reference)
- Because of older Nix versions, it can happen that we encounter a
realisation with an (erroneously) empty set of dependencies, in which
case we don’t want to fail, but just warn the user and try to fix it.
Fill `NIX_CONFIG` with the value of the current Nix configuration before
calling the post-build-hook.
That way the whole configuration (including the possible
`experimental-features`, a possibly `--store` option or whatever) will
be made available to the hook
'--delete-older-than 10' deletes the generations older than a single day, and '--delete-older-than 12m' deletes all generations older than 12 days.
This changes makes it throw on those invalid inputs, and gives an example of a valid input.
Add an access-control list to the realisations in recursive-nix (similar
to the already existing one for store paths), so that we can build
content-addressed derivations in the restricted store.
Fix#4353
Previously, the build system used uname(1) output when it wanted to
check the operating system it was being built for, which meant that it
didn't take into-account cross-compilation when the build and host
operating systems were different.
To fix this, instead of consulting uname output, we consult the host
triple, specifically the third "kernel" part.
For "kernel"s with stable ABIs, like Linux or Cygwin, we can use a
simple ifeq to test whether we're compiling for that system, but for
other platforms, like Darwin, FreeBSD, or Solaris, we have to use a
more complicated check to take into account the version numbers at the
end of the "kernel"s. I couldn't find a way to just strip these
version numbers in GNU Make without shelling out, which would be even
more ugly IMO. Because these checks differ between kernels, and the
patsubst ones are quite fiddly, I've added variables for each host OS
we might want to check to make them easier to reuse.
This way no derivation has to expect that these files are in the `cwd`
during the build. This is problematic for `nix-shell` where these files
would have to be inserted into the nix-shell's `cwd` which can become
problematic with e.g. recursive `nix-shell`.
To remain backwards-compatible, the location inside the build sandbox
will be kept, however using these files directly should be deprecated
from now on.
This is needed to push the adoption of structured attrs[1] forward. It's
now checked if a `__json` exists in the environment-map of the derivation
to be openend in a `nix-shell`.
Derivations with structured attributes enabled also make use of a file
named `.attrs.json` containing every environment variable represented as
JSON which is useful for e.g. `exportReferencesGraph`[2]. To
provide an environment similar to the build sandbox, `nix-shell` now
adds a `.attrs.json` to `cwd` (which is mostly equal to the one in the
build sandbox) and removes it using an exit hook when closing the shell.
To avoid leaking internals of the build-process to the `nix-shell`, the
entire logic to generate JSON and shell code for structured attrs was
moved into the `ParsedDerivation` class.
[1] https://nixos.mayflower.consulting/blog/2020/01/20/structured-attrs/
[2] https://nixos.org/manual/nix/unstable/expressions/advanced-attributes.html#advanced-attributes
Useful when we're using a daemon with a chroot store, e.g.
$ NIX_DAEMON_SOCKET_PATH=/tmp/chroot/nix/var/nix/daemon-socket/socket nix-daemon --store /tmp/chroot
Then the client can now connect with
$ nix build --store unix:///tmp/chroot/nix/var/nix/daemon-socket/socket?root=/tmp/chroot nixpkgs#hello
That doesn’t really make sense with CA derivations (and wasn’t even
really correct before because of FO derivations, though that probably
didn’t matter much in practice)
Make ca-derivations require a `ca-derivations` machine feature, and
ca-aware builders expose it.
That way, a network of builders can mix ca-aware and non-ca-aware
machines, and the scheduler will send them in the right place.
When adding a path to the local store (via `LocalStore::addToStore`),
ensure that the `ca` field of the provided `ValidPathInfo` does indeed
correspond to the content of the path.
Otherwise any untrusted user (or any binary cache) can add arbitrary
content-addressed paths to the store (as content-addressed paths don’t
need a signature).
Linux is (as far as I know) the only mainstream operating system that
requires linking with libdl for dlopen. On BSD, libdl doesn't exist,
so on non-FreeBSD BSDs linking will currently fail. On macOS, it's
apparently just a symlink to libSystem (macOS libc), presumably
present for compatibility with things that assume Linux.
So the right thing to do here is to only add -ldl on Linux, not to add
it for everything that isn't FreeBSD.
Only considers the closure in term of `Realisation`, ignores all the
opaque inputs.
Dunno whether that’s the nicest solution, need to think it through a bit
Align all the worker protocol with `buildDerivation` which inlines the
realisations as one opaque json blob.
That way we don’t have to bother changing the remote store protocol
when the definition of `Realisation` changes, as long as we keep the
json backwards-compatible
Align all the worker protocol with `buildDerivation` which inlines the
realisations as one opaque json blob.
That way we don’t have to bother changing the remote store protocol
when the definition of `Realisation` changes, as long as we keep the
json backwards-compatible
Move the `closure` logic of `computeFSClosure` to its own (templated) function.
This doesn’t bring much by itself (except for the ability to properly
test the “closure” functionality independently from the rest), but it
allows reusing it (in particular for the realisations which will require
a very similar closure computation)
For whatever reason, many programs trying to access SystemVersion.plist
also open SystemVersionCompat.plist; this includes Python code and
coreutils’ `cat(1)` (but not the native macOS `/bin/cat`). Illustratory
`dtruss(1m)` output:
open("/System/Library/CoreServices/SystemVersion.plist\0", 0x0, 0x0) = 3 0
open("/System/Library/CoreServices/SystemVersionCompat.plist\0", 0x0, 0x0) = 4 0
I assume this is a Big Sur change relating to the 10.16.x/11.x
version compatibility divide and that it’s something along the lines of
a hook inside libSystem.
Fixes a lot of sandboxed package builds under Big Sur.
When we don’t have enough free job slots to run a goal, we put it in
the waitForBuildSlot list & unlock its output locks. This will
continue from where we left off (tryLocalBuild). However, we need the
locks to get reacquired when/if the goal ever restarts. So, we need to
send it back through tryToBuild to get reqacquire those locks.
I think this bug was introduced in
https://github.com/NixOS/nix/pull/4570. It leads to some builds
starting without proper locks.
Similar to the nar-info disk cache (and using the same db).
This makes rebuilds muuch faster.
- This works regardless of the ca-derivations experimental feature.
I could modify the logic to not touch the db if the flag isn’t there,
but given that this is a trash-able local cache, it doesn’t seem to be
really worth it.
- We could unify the `NARs` and `Realisation` tables to only have one
generic kv table. This is left as an exercise to the reader.
- I didn’t update the cache db version number as the new schema just
adds a new table to the previous one, so the db will be transparently
migrated and is backwards-compatible.
Fix#4746
I guess I misunderstood John's initial explanation about why wildcards
for outputs are sent to older stores[1]. My `nix-daemon` from 2021-03-26
also has version 1.29, but misses the wildcard[2]. So bumping seems to
be the right call.
[1] https://github.com/NixOS/nix/pull/4759#issuecomment-830812464
[2] 255d145ba7
Starting in macOS 11, the on-disk dylib bundles are no longer available,
but nixpkgs needs to be able to keep compatibility with older versions
that require `/usr/lib/libSystem.B.dylib` in `__impureHostDeps`. Allow
it to keep backwards compatibility with these versions by marking these
dependencies as optional.
Fixes#4658.
They are equivalent according to
<https://spec.commonmark.org/0.29/#hard-line-breaks>,
and the trailing spaces tend to be a pain (because the make git
complain, editors tend to want to remove them − the `.editorconfig`
actually specifies that − etc..).
If there were many top-level goals (which are not destroyed until the
very end), commands like
$ nix copy --to 'ssh://localhost?remote-store=/tmp/nix' \
/run/current-system --no-check-sigs --substitute-on-destination
could fail with "Too many open files". So now we do some explicit
cleanup from amDone(). It would be cleaner to separate goals from
their temporary internal state, but that would be a bigger refactor.
This avoids an ambiguity where the `StorePathWithOutputs { drvPath, {}
}` could mean "build `brvPath`" or "substitute `drvPath`" depending on
context.
It also brings the internals closer in line to the new CLI, by
generalizing the `Buildable` type is used there and makes that
distinction already.
In doing so, relegate `StorePathWithOutputs` to being a type just for
backwards compatibility (CLI and RPC).
These are by no means part of the notion of a store, but rather are
things that happen to use stores. (Or put another way, there's no way
we'd make them virtual methods any time soon.) It's better to move them
out of that too-big class then.
Also, this helps us remove StorePathWithOutputs from the Store interface
altogether next commit.
A few versioning mistakes were corrected:
- In 27b5747ca7, Daemon protocol had some
version `>= 0xc` that should have been `>= 0x1c`, or `28` since the
other conditions used decimal.
- In a2b69660a9, legacy SSH gated new CAS
info on version 6, but version 5 in the server. It is now 6
everywhere.
Additionally, legacy ssh was sending over more metadata than the daemon
one was. The daemon now sends that data too.
CC @regnat
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
I guess the rationale behind the old name wath that
`pathInfoIsTrusted(info)` returns `true` iff we would need to `blindly`
trust the path (because it has no valid signature and `requireSigs` is
set), but I find it to be a really confusing footgun because it's quite
natural to give it the opposite meaning.