The failure modes for nix's sandboxing setup are pretty complicated.
When nix is unable to set up the sandbox, let's provide more detail
about what went wrong. Specifically:
* Make sure the error message includes the word "sandbox" so the user
knows that the failure was related to sandboxing.
* If `--option sandbox-fallback false` was provided, and removing it
would have allowed further attempts to make progress, let the user
know.
Specifically, if we're not root and the daemon socket does not exist,
then we use ~/.local/share/nix/root as a chroot store. This enables
non-root users to download nix-static and have it work out of the box,
e.g.
ubuntu@ip-10-13-1-146:~$ ~/nix run nixpkgs#hello
warning: '/nix' does not exists, so Nix will use '/home/ubuntu/.local/share/nix/root' as a chroot store
Hello, world!
With this, Nix will write a copy of the sandbox shell to /bin/sh in
the sandbox rather than bind-mounting it from the host filesystem.
This makes /bin/sh work out of the box with nix-static, i.e. you no
longer get
/nix/store/qa36xhc5gpf42l3z1a8m1lysi40l9p7s-bootstrap-stage4-stdenv-linux/setup: ./configure: /bin/sh: bad interpreter: No such file or directory
This allows changes to nix-cache-info to be picked up by existing
clients. Previously, the only way for this to happen would be for
clients to delete binary-cache-v6.sqlite, which is quite awkward for
users.
On the other hand, updates to nix-cache-info should be pretty rare,
hence the choice of a fairly long TTL. Configurability is probably not
useful enough to warrant implementing it.
The manpage for `getgrouplist` says:
> If the number of groups of which user is a member is less than or
> equal to *ngroups, then the value *ngroups is returned.
>
> If the user is a member of more than *ngroups groups, then
> getgrouplist() returns -1. In this case, the value returned in
> *ngroups can be used to resize the buffer passed to a further
> call getgrouplist().
In our original code, however, we allocated a list of size `10` and, if
`getgrouplist` returned `-1` threw an exception. In practice, this
caused the code to fail for any user belonging to more than 10 groups.
While unusual for single-user systems, large companies commonly have a
huge number of POSIX groups users belong to, causing this issue to crop
up and make multi-user Nix unusable in such settings.
The fix is relatively simple, when `getgrouplist` fails, it stores the
real number of GIDs in `ngroups`, so we must resize our list and retry.
Only then, if it errors once more, we can raise an exception.
This should be backported to, at least, 2.9.x.
Bring back the possibility to copy CA paths with no reference (like the
outputs of FO derivations or stuff imported at eval time) between stores
that have a different prefix.
A mips64el Linux MIPS kernel can execute userspace code using any of
three ABIs:
mips64el-linux-*abin64
mips64el-linux-*abin32
mipsel-linux-*
The first of these is the native 64-bit ABI, and the only ABI with
64-bit pointers; this is sometimes called "n64". The last of these is
the old legacy 32-bit ABI, whose binaries can execute natively on
32-bit MIPS hardware; this is sometimes called "o32".
The second ABI, "n32" is essentially the 64-bit ABI with 32-bit
pointers and address space. Hardware 64-bit integer/floating
arithmetic is still allowed, as well as the much larger mips64
register set and more-efficient calling convention.
Let's enable seccomp filters for all of these. Likewise for big
endian (mips64-linux-*).
Without the change any CA deletion triggers linear scan on large
RealisationsRefs table:
sqlite>.eqp full
sqlite> delete from RealisationsRefs where realisationReference IN ( select id from Realisations where outputPath = 1234567890 );
QUERY PLAN
|--SCAN RealisationsRefs
`--LIST SUBQUERY 1
`--SEARCH Realisations USING COVERING INDEX IndexRealisationsRefsOnOutputPath (outputPath=?)
With the change it gets turned into a lookup:
sqlite> CREATE INDEX IndexRealisationsRefsRealisationReference on RealisationsRefs(realisationReference);
sqlite> delete from RealisationsRefs where realisationReference IN ( select id from Realisations where outputPath = 1234567890 );
QUERY PLAN
|--SEARCH RealisationsRefs USING INDEX IndexRealisationsRefsRealisationReference (realisationReference=?)
`--LIST SUBQUERY 1
`--SEARCH Realisations USING COVERING INDEX IndexRealisationsRefsOnOutputPath (outputPath=?)
If the derivation `foo` depends on `bar`, and they both have the same
output path (because they are CA derivations), then this output path
will depend both on the realisation of `foo` and of `bar`, which
themselves depend on each other.
This confuses SQLite which isn’t able to automatically solve this
diamond dependency scheme.
Help it by adding a trigger to delete all the references between the
relevant realisations.
Fix#5320
Otherwise the clang builds fail because the constructor of `SQLiteBusy`
inherits it, `SQLiteError::_throw` tries to call it, which fails.
Strangely, gcc works fine with it. Not sure what the correct behavior is
and who is buggy here, but either way, making it public is at the worst
a reasonable workaround
This ensures that use-sites properly trigger new monomorphisations on
one hand, and on the other hand keeps the main `sqlite.hh` clean and
interface-only. I think that is good practice in general, but in this
situation in particular we do indeed have `sqlite.hh` users that don't
need the `throw_` function.
Previously it only logged the builder's path, this changes it to log the
arguments at the same log level, and the environment variables at the
vomit level.
This helped me debug https://github.com/svanderburg/node2nix/issues/75
This was caused by SubstitutionGoal not setting the errorMsg field in
its BuildResult. We now get a more descriptive message than in 2.7.0, e.g.
error: path '/nix/store/13mh...' is required, but there is no substituter that can build it
instead of the misleading (since there was no build)
error: build of '/nix/store/13mh...' failed
Fixes#6295.
Impure derivations are derivations that can produce a different result
every time they're built. Example:
stdenv.mkDerivation {
name = "impure";
__impure = true; # marks this derivation as impure
outputHashAlgo = "sha256";
outputHashMode = "recursive";
buildCommand = "date > $out";
};
Some important characteristics:
* This requires the 'impure-derivations' experimental feature.
* Impure derivations are not "cached". Thus, running "nix-build" on
the example above multiple times will cause a rebuild every time.
* They are implemented similar to CA derivations, i.e. the output is
moved to a content-addressed path in the store. The difference is
that we don't register a realisation in the Nix database.
* Pure derivations are not allowed to depend on impure derivations. In
the future fixed-output derivations will be allowed to depend on
impure derivations, thus forming an "impurity barrier" in the
dependency graph.
* When sandboxing is enabled, impure derivations can access the
network in the same way as fixed-output derivations. In relaxed
sandboxing mode, they can access the local filesystem.
Rather than having four different but very similar types of hashes, make
only one, with a tag indicating whether it corresponds to a regular of
deferred derivation.
This implies a slight logical change: The original Nix+multiple-outputs
model assumed only one hash-modulo per derivation. Adding
multiple-outputs CA derivations changed this as these have one
hash-modulo per output. This change is now treating each derivation as
having one hash modulo per output.
This obviously means that we internally loose the guaranty that
all the outputs of input-addressed derivations have the same hash
modulo. But it turns out that it doesn’t matter because there’s nothing
in the code taking advantage of that fact (and it probably shouldn’t
anyways).
The upside is that it is now much easier to work with these hashes, and
we can get rid of a lot of useless `std::visit{ overloaded`.
Co-authored-by: John Ericson <John.Ericson@Obsidian.Systems>
This avoids an infinite loop in the final test in
tests/binary-cache.sh. I think this was only not triggered previously
by accident (because we were clearing wantedOutputs in between).
LocalStore::addToStore() since
79ae9e4558 expects a regular NAR hash,
rather than a NAR hash modulo self-references. Fixes#6300.
Also, makeContentAddressed() now rewrites the entire closure (so 'nix
store make-content-addressable' no longer needs '-r'). See #6301.
1. `DerivationOutput` now as the `std::variant` as a base class. And the
variants are given hierarchical names under `DerivationOutput`.
In 8e0d0689be @matthewbauer and I
didn't know a better idiom, and so we made it a field. But this sort
of "newtype" is anoying for literals downstream.
Since then we leaned the base class, inherit the constructors trick,
e.g. used in `DerivedPath`. Switching to use that makes this more
ergonomic, and consistent.
2. `store-api.hh` and `derivations.hh` are now independent.
In bcde5456cc I swapped the dependency,
but I now know it is better to just keep on using incomplete types as
much as possible for faster compilation and good separation of
concerns.
Before the change garbage collector was not considering
`.drv` and outputs as alive even if configuration says otherwise.
As a result `nix store gc --dry-run` could visit (and parse)
`.drv` files multiple times (worst case it's quadratic).
It happens because `alive` set was populating only runtime closure
without regard for actual configuration. The change fixes it.
Benchmark: my system has about 139MB, 40K `.drv` files.
Performance before the change:
$ time nix store gc --dry-run
real 4m22,148s
Performance after the change:
$ time nix store gc --dry-run
real 0m14,178s
Don’t try and assume that we know the output paths when we’ve just built
with `--dry-run`. Instead make `--dry-run` follow a different code path
that won’t assume the knowledge of the output paths at all.
Fix#6275
Before the change on a system with `auto-optimise-store = true`:
$ nix store gc --verbose --max 1
deleted all the paths instead of one path (we requested 1 byte limit).
It happens because every file in `auto-optimise-store = true` has at
least 2 links: file itself and a link in /nix/store/.links/ directory.
The change conservatively assumes that any file that has one (as before)
or two links (assume auto-potimise mode) will free space.
Co-authored-by: Sandro <sandro.jaeckel@gmail.com>
This changes was taken from dynamic derivation (#4628). It` somewhat
undoes the refactors I first did for floating CA derivations, as the
benefit of hindsight + requirements of dynamic derivations made me
reconsider some things.
They aren't to consequential, but I figured they might be good to land
first, before the more profound changes @thufschmitt has in the works.
Continue progress on #5729.
Just as I hoped, this uncovered an issue: the daemon protocol is missing
a way to query build logs. This doesn't effect `unix://`, but does
effect `ssh://`. A FIXME is left for this, so we come back to it later.
This function is like buildPaths(), except that it returns a vector of
BuildResults containing the exact statuses and output paths of each
derivation / substitution. This is convenient for functions like
Installable::build(), because they then don't need to do another
series of calls to get the outputs of CA derivations. It's also a
precondition to impure derivations, where we *can't* query the output
of those derivations since they're not stored in the Nix database.
Note that PathSubstitutionGoal can now also return a BuildStatus.
Starts progress on #5729.
The idea is that we should not have these default methods throwing
"unimplemented". This is a small step in that direction.
I kept `addTempRoot` because it is a no-op, rather than failure. Also,
as a practical matter, it is called all over the place, while doing
other tasks, so the downcasting would be annoying.
Maybe in the future I could move the "real" `addTempRoot` to `GcStore`,
and the existing usecases use a `tryAddTempRoot` wrapper to downcast or
do nothing, but I wasn't sure whether that was a good idea so with a
bias to less churn I didn't do it yet.
Setting the `_NIX_FORCE_HTTP` environment variable is supposed to force `file://` store urls to use the `HttpBinaryCacheStore` implementation rather than the `LocalBinaryCacheStore` one (very useful for testing).
However because of a name mismatch, the `LocalBinaryCacheStore` was still registering the `file` scheme when this variable was set, meaning that the actual store implementation picked up on `file://` uris was dependent on the registration order of the stores (itself dependent on the link order of the object files).
Fix this by making the `LocalBinaryCacheStore` gracefully not register the `file` uri scheme when the variable is set.
Starting work on #5638
The exact boundary between `FetchSettings` and `EvalSettings` is not
clear to me, but that's fine. First lets clean out `libstore`, and then
worry about what, if anything, should be the separation between those
two.
To avoid that JSON messages are parsed twice in case of
remote builds with `ssh-ng://`, I split up the original
`handleJSONLogMessage` into three parts:
* `parseJSONMessage(const std::string&)` checks if it's a message in the
form of `@nix {...}` and tries to parse it (and prints an error if the
parsing fails).
* `handleJSONLogMessage(nlohmann::json&, ...)` reads the fields from the
message and passes them to the logger.
* `handleJSONLogMessage(const std::string&, ...)` behaves as before, but
uses the two functions mentioned above as implementation.
In case of `ssh-ng://`-logs the first two methods are invoked manually.
Right now when building a derivation remotely via
$ nix build -j0 -f . hello -L --builders 'ssh://builder'
it's possible later to read through the entire build-log by running
`nix log -f . hello`. This isn't possible however when using `ssh-ng`
rather than `ssh`.
The reason for that is that there are two different ways to transfer
logs in Nix through e.g. an SSH tunnel (that are used by `ssh`/`ssh-ng`
respectively):
* `ssh://` receives its logs from the fd pointing to `builderOut`. This
is directly passed to the "log-sink" (and to the logger on each `\n`),
hence `nix log` works here.
* `ssh-ng://` however expects JSON-like messages (i.e. `@nix {log data
in here}`) and passes it directly to the logger without doing anything
with the `logSink`. However it's certainly possible to extract
log-lines from this format as these have their own message-type in the
JSON payload (i.e. `resBuildLogLine`).
This is basically what I changed in this patch: if the code-path for
`builderOut` is not reached and a `logSink` is initialized, the
message was successfully processed by the JSON logger (i.e. it's in
the expected format) and the line is of the expected type (i.e.
`resBuildLogLine`), the line will be written to the log-sink as well.
Closes#5079
If we want to be careful about hitting the stack protector page, we should use `-fstack-check` instead.
Co-authored-by: Eelco Dolstra <edolstra@gmail.com>
This was removed in 2e199673a5 when
`copyPath` transitioned to use `RealisedPath`. But then in
e9848beca7 we added it back just for
`realisedPath`.
I think it is a good utility function --- one can easily imagine it
becoming optimized in the future, and copying paths *violating* the
closure is a very niche feature.
So if we have `copyPaths` for both sorts of paths, I think we should
have `copyClosure` for both sorts too.
This removes a dynamic stack allocation, making the derivation
unparsing logic robust against overflows when large strings are
added to a derivation.
Overflow behavior depends on the platform and stack configuration.
For instance, x86_64-linux/glibc behaves as (somewhat) expected:
$ (ulimit -s 20000; nix-instantiate tests/lang/eval-okay-big-derivation-attr.nix)
error: stack overflow (possible infinite recursion)
$ (ulimit -s 40000; nix-instantiate tests/lang/eval-okay-big-derivation-attr.nix)
error: expression does not evaluate to a derivation (or a set or list of those)
However, on aarch64-darwin:
$ nix-instantiate big-attr.nix ~
zsh: segmentation fault nix-instantiate big-attr.nix
This indicates a slight flaw in the single stack protection page
approach that is not encountered with normal stack frames.
There already existed a smoke test for the link content length,
but it appears that there exists some corruptions pernicious enough
to replace the file content with zeros, and keeping the same length.
--repair-path now goes as far as checking the content of the link,
making it true to its name and actually repairing the path for such
coruption cases.
This was already accidentally disabled in ba87b08. It also no longer
appears to be beneficial, and in fact slow things down, e.g. when
evaluating a NixOS system configuration:
elapsed time: median = 3.8170 mean = 3.8202 stddev = 0.0195 min = 3.7894 max = 3.8600 [rejected, p=0.00000, Δ=0.36929±0.02513]
Add a `_NIX_TRACE_BUILT_OUTPUTS` environment variable that can be set to
a filename in which the result of each build will be logged.
This is intentionally crude and undocumented as it’s only meant to be a
temporary thing to assess the usefulness of CA derivations.
Any other use would need a cleaner re-implementation first.
Make the build of unresolved derivations return the same status as the
resolved one, except in the case of an `AlreadyValid` in which case it
will return `ResolvesToAlreadyValid` to mean that the outputs of the unresolved
derivation weren’t known, but the resolved one is.
I downloaded Nix tonight, and immediately broke it by accidentally removing the default binary caching.
After figuring this out, I also failed to fix it properly, due to using the wrong key for Nix's default binary cache
If the diagnostic message would have been clearer about what/where a "signature" for a "substituter" is + comes from, it probably would have saved me a few hours.
Maybe we can save other noobs the same pain?
Because the manual is generated from default values which are themselves
generated from various sources (cpuid, bios settings (kvm), number of
cores). This commit hides non-reproducible settings from the manual
output.
No matter what, we need to resize the buffer to not have any scratch
space after we do the `read`. In the end of file case, `got` will be 0
from it's initial value.
Before, we forgot to resize in the EOF case with the break. Yes, we know
we didn't recieve any data in that case, but we still have the scatch
space to undo.
Co-Authored-By: Will Fancher <Will.Fancher@Obsidian.Systems>
This doesn't fix the bug, but makes the code less difficult to read.
Also improve the comments, now that it is clear what part is needed in
each code path.
For a typical desktop system (~2K packages) we can easily get 100K
entries in RealisationsRefs. Without indices query for RealisationsRefs
requires linear scan.
RealisationsRefs(referrer)
--------------------------
Inefficiency is seen as a 100% CPU load of nix-daemon for the following
scenario:
$ nix edit -f . bash # add unused environment variable, like FOO="1"
# populate RealisationsRefs, build fresh system
$ nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
$ nix edit -f . bash # add unused environment variable, like FOO="2"
$ time nix build -f nixos system --arg config '{ contentAddressedByDefault = true; }'
In this case `bash `will be rebuilt a few times and then rest of CPU
time is spent on scanning RealisationsRefs table (about 5 CPU-minutes
on my machine).
Before the change:
$ time nix build -f nixos system ... # step 4 above
real 34m3,613s
user 0m5,232s
sys 0m0,758s
Of all this time about 29.5 minutes are taken by nix-daemon's CPU time.
After the change:
$ time nix build -f nixos system ... # step 4 above
real 4m50,061s
user 0m5,038s
sys 0m0,677s
Of all this time about 1 minute is taken by nix-daemon's CPU time.
Most of the time is spent polling for non-existent realisations on
cache-nixos.org.
Realisations(outputPath)
------------------------
After running CA system for two weeks I got ~1M entries in Realisations
table. `nix-collect-garbage` became very slow (seemingly 100 path deletions
per second). It happens due to a slow cascading delete from Realisations
triggered by deletion from ValidPaths.
The fix is to add an index on primary key from ValidPaths(id) that
triggers cascading deletions.
Before the change:
$ time nix-collect-garbage -d --max-freed 100G
<interrupted before finish, took too long>
real 23m32.411s
user 17m49.679s
sys 4m50.609s
Most of time was spent in re-scanning Realisations table on each path deletion.
After the change:
$ time nix-collect-garbage -d --max-freed 100G
real 8m43.226s
user 6m16.317s
sys 1m40.188s
Time is spent scanning sqlite indices and in kernel when unlinking directories.
Doing it as a side-effect of calling LocalStore::makeStoreWritable()
is very ugly.
Also, make sure that stopping the progress bar joins the update
thread, otherwise that thread should be unshared as well.
Since 4806f2f6b0, we can't have paths with
references passed to builtins.{path,filterSource}. This prevents many cases
of those functions called on IFD outputs from working. Resolve this by
passing the references found in the original path to the added path.
When setting flake-local options (with the `nixConfig` field), forward
these options to the daemon in case we’re using one.
This is necessary in particular for options like `binary-caches` or
`post-build-hook` to make sense.
Fix <343239fc8a (r44356843)>
Rather than having them plain strings scattered through the whole
codebase, create an enum containing all the known experimental features.
This means that
- Nix can now `warn` when an unkwown experimental feature is passed
(making it much nicer to spot typos and spot deprecated features)
- It’s now easy to remove a feature altogether (once the feature isn’t
experimental anymore or is dropped) by just removing the field for the
enum and letting the compiler point us to all the now invalid usages
of it.
Currently machine specification (`/etc/nix/machine`) parser fails
with a vague exception if the file had incorrect format.
This commit adds verbose exceptions and unit-tests for the parser.
This ensures any started processes can't write to /nix/store (except
during builds). This partially reverts 01d07b1e, which happened because
of #2646.
The problem was only happening after nix downloads anything, causing
me to suspect the download thread. The problem turns out to be:
"A process can't join a new mount namespace if it is sharing
filesystem-related attributes with another process", in this case this
process is the curl thread.
Ideally, we might kill it before spawning the shell process, but it's
inside a static variable in the getFileTransfer() function. So
instead, stop it from sharing FS state using unshare(). A strategy
such as the one from #5057 (single-threaded chroot helper binary) is
also very much on the table.
Fixes#4337.
This fixes a bug in the garbage collector where if a path
/nix/store/abcd-foo is valid, but we do a
isValidPath("/nix/store/abcd-foo.lock") first, then a negative entry
for /nix/store/abcd is added to pathInfoCache, so /nix/store/abcd-foo
is subsequently considered invalid and deleted.
(where "referrers" includes the reverse of derivation outputs and
derivers). Now we do a full traversal to look if we can reach any
root. If not, all paths reached can be deleted.
The garbage collector no longer blocks other processes from
adding/building store paths or adding GC roots. To prevent the
collector from deleting store paths just added by another process,
processes need to connect to the garbage collector via a Unix domain
socket to register new temporary roots.
This reverts some parts of commit
8430a8f086 which was trying to rethrow
some exceptions while we weren’t in the context of a `catch` block,
causing some weird “terminate called without an active exception”
errors.
Fix#5368
In https://github.com/NixOS/nix/pull/5350 we noticed link failures
pkgsStatic.nixUnstable. Adding explicit dependency on libutil fixes
libstore-tests linking.
When I stop a download with Ctrl-C in a `nix repl` of a flake, the REPL
refuses to do any other downloads:
nix-repl> builtins.getFlake "nix-serve"
[0.0 MiB DL] downloading 'https://api.github.com/repos/edolstra/nix-serve/tarball/e9828a9e01a14297d15ca41 error: download of 'e9828a9e01' was interrupted
[0.0 MiB DL]
nix-repl> builtins.getFlake "nix-serve"
error: interrupted by the user
[0.0 MiB DL]
To fix this issue, two changes were necessary:
* Reset the global `_isInterrupted` variable: only because a single
operation was aborted, it should still be possible to continue the
session.
* Recreate a `fileTransfer`-instance if the current one was shut down by
an abort.
9c766a40cb broke logging from the
daemon, because commonChildInit is called when starting the build hook
in a vfork, so it ends up resetting the parent's logger. So don't
vfork.
It might be best to get rid of vfork altogether, but that may cause
problems, e.g. when we call an external program like git from the
evaluator.
Before the changes when building the whole system with
`contentAddressedByDefault = true;` we get many noninformative messages:
$ nix build -f nixos system --keep-going
...
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0'; cross fingers
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0-dev'; cross fingers
warning: rewriting hashes in '/nix/store/...-clang-11.1.0.drv.chroot/nix/store/...-11.1.0-python'; cross fingers
error: 2 dependencies of derivation '/nix/store/...-hub-2.14.2.drv' failed to build
warning: rewriting hashes in '/nix/store/...-subversion-1.14.1.drv.chroot/nix/store/...-subversion-1.14.1-dev'; cross fingers
warning: rewriting hashes in '/nix/store/...-subversion-1.14.1.drv.chroot/nix/store/...-subversion-1.14.1-man'; cross fingers
...
Let's downgrade these messages down to debug().