check goals for timeouts first, and their activity fds only if no
timeout has occurred. checking for timeouts *after* activity sets
us up for assertion failures by running multiple build completion
notifiers, the first of which will kill/reap the the goal process
and consuming the Pid instance. when the second notifier attempts
to do the same it will core dump with an assertion failure in Pid
and take down not only the single goal, but the entire daemon and
all goals it was building. luckily this is rare in practice since
it requires a build to both finish and time out at the same time.
writing a test for this is not feasible due to how much it relies
on scheduling to actually trigger the underlying bug, but on idle
machines it can usually be triggered by running multiple sleeping
builds with timeout set to the sleep duration and `--keep-going`:
nix-build --timeout 10 --builders '' --keep-going -E '
with import <nixpkgs> {};
builtins.genList
(i: runCommand "foo-${toString i}" {} "sleep 10")
100
'
Change-Id: I394d36b2e5ffb909cf8a19977d569bbdb71cb67b
The `builder` local variable and duplicate `args.push_back` are no
longer required since the Darwin sandbox stopped using `sandbox-exec`.
The `drv->isBuiltin` check is not required either, as args are not
accessed when the builder is builtin.
Change-Id: I80b939bbd6f727b01793809921810ff09b579d54
Seccomp filtering and the no-new-privileges functionality improve the security
of the sandbox, and have been enabled by default for a long time. In
lix-project/lix#265 it was decided that they
should be enabled unconditionally. Accordingly, remove the allow-new-privileges
(which had weird behavior anyway) and filter-syscall settings, and force the
security features on. Syscall filtering can still be enabled at build time to
support building on architectures libseccomp doesn't support.
Change-Id: Iedbfa18d720ae557dee07a24f69b2520f30119cb
This breaks downstreams linking to us on purpose to make sure that if
someone is linking to Lix they're doing it on purpose and crucially not
mixing up Nix and Lix versions in compatibility code.
We still need to fix the internal includes to follow the same schema so
we can drop the single-level include system entirely. However, this
requires a little more effort.
This adds pkg-config for libfetchers and config.h.
Migration path:
expr.hh -> lix/libexpr/expr.hh
nix/config.h -> lix/config.h
To apply this migration automatically, remove all `<nix/>` from
includes, so: `#include <nix/expr.hh>` -> `#include <expr.hh>`. Then,
the correct paths will be resolved from the tangled mess, and the
clang-tidy automated fix will work.
Then run the following for out of tree projects:
```
lix_root=$HOME/lix
(cd $lix_root/clang-tidy && nix develop -c 'meson setup build && ninja -C build')
run-clang-tidy -checks='-*,lix-fixincludes' -load=$lix_root/clang-tidy/build/liblix-clang-tidy.so -p build/ -fix src
```
Related: lix-project/nix-eval-jobs#5
Fixes: lix-project/lix#279
Change-Id: I7498e903afa6850a731ef8ce77a70da6b2b46966
Move the identical static `chmod_` functions in libstore to
libutil. the function is called `chmodPath` instead of `chmod`
as otherwise it will shadow the standard library chmod in the nix
namespace, which is somewhat confusing.
Change-Id: I7b5ce379c6c602e3d3a1bbc49dbb70b1ae8f7bad
having the serializer write into `*conn` is not legal because we are
in a sinkToSource that will be drained by the remote we're connected
to. writing into `*conn` directly can break the framing protocol. it
is unlikely this code was ever run: to protocol it caters to is from
2016(!) and thoroughly untested in-tree, and since it's been present
since nix 2.17 and the 1.18 protocol broken here is nix 2.0 we might
safely assume that daemons older than nix 2.1 are no longer used now
see also #325 (though that wants <2.3 gone, this is sadly only <2.1)
Change-Id: I9d674c18f6d802f61c5d85dfd9608587b73e70a5
On several occasions I've found myself confused when trying to delete
a store path, because I am told it's still alive, but
nix-store --query --roots doesn't show anything. Let's save future
users this confusion by mentioning that a path might be alive due to
having referrers, not just roots.
(cherry picked from commit 979a019014569eee7d0071605f6ff500b544f6ac)
Upstream-PR: https://github.com/NixOS/nix/pull/10733
Change-Id: I54ae839a85f3de3393493fba27fd40d7d3af0516
Example: /nix/store/dr53sp25hyfsnzjpm8mh3r3y36vrw3ng-neovim-0.9.5^out
This is nonsensical since selecting outputs can only be done for a
buildable derivation, not for a realised store path. The build worker
side of things ends up crashing with an assertion when trying to handle
such malformed paths.
Change-Id: Ia3587c71fe3da5bea45d4e506e1be4dd62291ddf
it's no longer used. it really shouldn't have existed this long since it
was just a mashup of both std::promise and std::packaged_task in a shape
that makes composition unnecessarily difficult. all but a single case of
Callback pattern calls were fully synchronous anyway, and even this sole
outlier was by far not important enough to justify the extra complexity.
Change-Id: I208aec4572bf2501cdbd0f331f27d505fca3a62f
also add a few more tests for exception propagation behavior. using
packaged_tasks and futures (which only allow a single call to a few
of their methods) introduces error paths that weren't there before.
Change-Id: I42ca5236f156fefec17df972f6e9be45989cf805
this is the *only* real user of file transfer download completion
callbacks, and a pretty spurious user at that (seeing how nothing
here is even turned on by default and indeed a dependency of path
substitution which *isn't* async, and concurrency-limited). it'll
be a real pain to keep this around, and realistically it would be
a lot better to overhaul substitution in general to be *actually*
async. that requires a proper async framework footing though, and
we don't have anything of the sort, but it's also blocking *that*
Change-Id: I1bf671f217c654a67377087607bf608728cbfc83
The fix for the Darwin vulnerability in ecdbc3b207
also broke setting `__sandboxProfile` when `sandbox=relaxed` or
`sandbox=false`. This cppnix change fixes `sandbox=relaxed` and
adds a suitable test.
Co-Authored-By: Artemis Tosini <lix@artem.ist>
Co-Authored-By: Eelco Dolstra <edolstra@gmail.com>
Change-Id: I40190f44f3e1d61846df1c7b89677c20a1488522
Because of an objc quirk[1], calling curl_global_init for the first time
after fork() will always result in a crash.
Up until now the solution has been to set
OBJC_DISABLE_INITIALIZE_FORK_SAFETY for every nix process to ignore
that error.
This is less than ideal because we were setting it in package.nix,
which meant that running nix tests locally would fail because
that variable was not set.
Instead of working around that error we address it at the core -
by calling curl_global_init inside initLibStore, which should mean
curl will already have been initialized by the time we try to do so in
a forked process.
[1] 01edf1705f/runtime/objc-initialize.mm (L614-L636)
Change-Id: Icf26010a8be655127cc130efb9c77b603a6660d0
only two users of this function exist. only one used it in a way that
even bears resemblance to asynchronicity, and even that one didn't do
it right. fully async and parallel computation would have only worked
if any getEdgesAsync never calls the continuation it receives itself,
only from more derived callbacks running on other threads. calling it
directly would cause the decoupling promise to be awaited immediately
*on the original thread*, completely negating all nice async effects.
Change-Id: I0aa640950cf327533a32dee410105efdabb448df
this seems to be an oversight, considering that regular substitutions
are concurrency-limited. while not particularly necessary at present,
once we've removed the `Callback` based interfaces it will be needed.
Change-Id: Ide2d08169fcc24752cbd07a1d33fb8482f7034f5
When /nix/var (or, more precisely, NIX_STATE_DIR) does not exist at all,
Lix falls back to creating an adhoc chroot store in XDG_DATA_HOME.
b247ef72d[1] changed the way Store classes are initialized, and in the
migration, a `params2` was accidentally changed to `params`. This commit
restores the correct behavior, and in lieu of a single *character* fix,
this commit also changes the variable name to something more reasonable.
Fixes#274.
[1]: b247ef72dc
n.b., this code might deserve some more looking at anyway. this fallback
store creation throws away *all* Store params passed to
openFromNonUri() in favor of an entirely new set which only contains
the `root` param, which may or may not be the correct behavior
Change-Id: Ibea559b88a50e6d6e75a1f87d9d7816cabb2a8f3
returning 0 from the callback for errors signals successful transfer if
the source returned no data even though the exception we've just caught
clearly disagrees. while this is not all that important (since the only
viable cause of such errors will be dataCallback, and the sole instance
of it being used already takes care of exceptions) we can just do this.
Change-Id: I2bb150eff447121d82e8e3aa4e00057c40523ac6
this will be necessary if we want download() to return a source instead
of consuming a sink, which will in turn be needed to remove coroutines.
Change-Id: I34ec241e9bbc5d32fbcd243b244e29c3757533aa
not doing this will cause transfers that had their readers disappear to
linger. with lingering transfers the curl thread can't shut down, which
will cause nix itself to not shut down until the transfer finishes some
other way (most likely network timeouts). also add a new test for this.
Change-Id: Id2401b3ac85731c824db05918d4079125be25b57
This was found when `logrotate.conf` failed to build in a NixOS system
with:
/nix/store/26zdl4pyw5qazppj8if5lm8bjzxlc07l-coreutils-9.3/bin/id: cannot find name for group ID 30000
This was surprising because it seemed to mean that /etc/group was busted
in the sandbox. Indeed it was:
root❌0:
nixbld:!💯
nogroup❌65534:
We diagnosed this to sandboxUid() being called before
usingUserNamespace() was called, in setting up /etc/group inside the
sandbox. This code desperately needs refactoring.
We also moved the /etc/group code to be with the /etc/passwd code, but
honestly this code is all spaghetti'd all over the place and needs some
more serious tidying than we did here.
We also moved some checks to be earlier to improve locality with where
the things they are checking come from.
Change-Id: Ie29798771f3593c46ec313a32960fa955054aceb
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.
Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:
> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through 9b88e52846/src/libstore/build/local-derivation-goal.cc (L1650-L1663).
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.
Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.
This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
is needed to have the correct syscall number available via
`__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:
$ rg --pcre2 'define __NR_fchmodat2 (?!452)'
sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
58:#define __NR_fchmodat2 1073742276
sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
67:#define __NR_fchmodat2 6452
sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
62:#define __NR_fchmodat2 5452
sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
70:#define __NR_fchmodat2 4452
sysdeps/unix/sysv/linux/alpha/arch-syscall.h
59:#define __NR_fchmodat2 562
I added a small regression-test to the setuid integration-test that
attempts to set the suid bit on a file using the fchmodat2 syscall.
I confirmed that the test fails without the change in
local-derivation-goal.
Additionally, we require libseccomp 2.5.5 or greater now: as it turns
out, libseccomp maintains an internal syscall table and
validates each rule against it. This means that when using libseccomp
2.5.4 or older, one may pass `452` as syscall number against it, but
since it doesn't exist in the internal structure, `libseccomp` will refuse
to create a filter for that. This happens with nixpkgs-23.11, i.e. on
stable NixOS and when building Lix against the project's flake.
To work around that
* a backport of libseccomp 2.5.5 on upstream nixpkgs has been
scheduled[3].
* the package now uses libseccomp 2.5.5 on its own already. This is to
provide a quick fix since the correct fix for 23.11 is still a staging cycle
away.
We still need the compat header though since `SCMP_SYS(fchmodat2)`
internally transforms this into `__SNR_fchmodat2` which points to
`__NR_fchmodat2` from glibc 2.39, so it wouldn't build on glibc 2.38.
The updated syscall table from libseccomp 2.5.5 is NOT used for that
step, but used later, so we need both, our compat header and their
syscall table 🤷
Relevant PRs in CppNix:
* https://github.com/NixOS/nix/pull/10591
* https://github.com/NixOS/nix/pull/10501
[1] https://github.com/NixOS/nixpkgs/issues/300635#issuecomment-2031073804
[2] https://github.com/NixOS/nixpkgs/issues/300635#issuecomment-2030844251
[3] https://github.com/NixOS/nixpkgs/pull/306070
(cherry picked from commit ba6804518772e6afb403dd55478365d4b863c854)
Change-Id: I6921ab5a363188c6bff617750d00bb517276b7fe
Currently LocalDerivationGoal allows setting `__sandboxProfile`
to add sandbox parameters on Darwin when `sandbox=true`.
This was only supposed to have an effect when `sandbox=relaxed`
Change-Id: Ide44ee82d7e4d6b545285eab26547e7014817d3f
As discussed in the maintainer meeting on 2024-01-29.
Mainly this is to avoid a situation where the name is parsed and
treated as a file name, mostly to protect users.
.-* and ..-* are also considered invalid because they might strip
on that separator to remove versions. Doesn't really work, but that's
what we decided, and I won't argue with it, because .-* probably
doesn't seem to have a real world application anyway.
We do still permit a 1-character name that's just "-", which still
poses a similar risk in such a situation. We can't start disallowing
trailing -, because a non-zero number of users will need it and we've
seen how annoying and painful such a change is.
What matters most is preventing a situation where . or .. can be
injected, and to just get this done.
(cherry picked from commit f1b4663805a9dbcb1ace64ec110092d17c9155e0)
Change-Id: I900a8509933cee662f888c3c76fa8986b0058839
This replaces the external sandbox-exec call with direct calls into
libsandbox. This API is technically deprecated and is missing some
prototypes, but all major browsers depend on it, so it is unlikely to
materially change without warning.
This commit also ensures the netrc file is only written if the
derivation is in fact meant to be able to access the internet.
This change commits a sin of not actually actively declaring its
dependency on macOS's libsandbox.dylib; this is due to the dylib
cache in macOS making that explicit dependency unnecessary. In the
future this might become a problem, so this commit marks our sins.
Co-authored-by: Artemis Tosini <lix@artem.ist>
Co-authored-by: Lunaphied <lunaphied@lunaphied.me>
Change-Id: Ia302141a53ce7b0327c1aad86a117b6645fe1189
That's expected by `build-remote` and makes sure that errors are
correctly forwarded to the user. For instance, let's say that the
host-key of `example.org` is unknown and
nix-build ../nixpkgs -A hello -j0 --builders 'ssh-ng://example.org'
is issued, then you get the following output:
cannot build on 'ssh-ng://example.org?&': error: failed to start SSH connection to 'example.org'
Failed to find a machine for remote build!
derivation: yh46gakxq3kchrbihwxvpn5bmadcw90b-hello-2.12.1.drv
required (system, features): (x86_64-linux, [])
2 available machines:
[...]
The relevant information (`Host key verification failed`) ends up in the
daemon's log, but that's not very obvious considering that the daemon
isn't very chatty normally.
This can be fixed - the same way as its done for legacy-ssh - by passing
fd 4 to the SSH wrapper. Now you'd get the following error:
cannot build on 'ssh-ng://example.org': error: failed to start SSH connection to 'example.org': Host key verification failed.
Failed to find a machine for remote build!
[...]
...and now it's clear what's wrong.
Please note that this is won't end up in the derivation's log.
For previous discussion about this change see
https://github.com/NixOS/nix/pull/7659.
Change-Id: I5790856dbf58e53ea3e63238b015ea06c347cf92
only decompress the response once all data has been received (in the
fully buffered case), or at least outside of the curl wrapper itself
(in the receive-to-sink case). unfortunately this means we will have
to duplicate decompression logic for these two cases for time being,
but once the curl wrapper has been rewritten to return a real future
or Source we can deduplicate this logic again. the curl wrapper will
have to turn into a proper Source first and use decompression source
logic which also does not currently exist—only decompression *sinks*
Change-Id: I66bc692f07d9b9e69fe10689ee73a2de8d65e35c