Previously, system call filtering (to prevent builders from storing files with
setuid/setgid permission bits or extended attributes) was performed using a
blocklist. While this looks simple at first, it actually carries significant
security and maintainability risks: after all, the kernel may add new syscalls
to achieve the same functionality one is trying to block, and it can even be
hard to actually add the syscall to the blocklist when building against a C
library that doesn't know about it yet. For a recent demonstration of this
happening in practice to Nix, see the introduction of fchmodat2 [0] [1].
The allowlist approach does not share the same drawback. While it does require
a rather large list of harmless syscalls to be maintained in the codebase,
failing to update this list (and roll out the update to all users) in time has
rather benign effects; at worst, very recent programs that already rely on new
syscalls will fail with an error the same way they would on a slightly older
kernel that doesn't support them yet. Most importantly, no unintended new ways
of performing dangerous operations will be silently allowed.
Another possible drawback is reduced system call performance due to the larger
filter created by the allowlist requiring more computation [2]. However, this
issue has not convincingly been demonstrated yet in practice, for example in
systemd or various browsers.
This commit tries to keep the behavior as close to unchanged as possible. Only
newer syscalls that are not supported by glibc 2.38 (as found in NixOS 23.11)
are blocked. Since this includes fchmodat2, the compatibility code added for
handling this syscall can be removed too.
[0] https://github.com/NixOS/nixpkgs/issues/300635
[1] https://github.com/NixOS/nix/issues/10424
[2] https://github.com/flatpak/flatpak/pull/4462#issuecomment-1061690607
Change-Id: I541be3ea9b249bcceddfed6a5a13ac10b11e16ad
This reverts commit 35eec921af.
Reason for revert: Regressed nix-eval-jobs, and it appears to be this change is buggy/missing a case. It just needs another pass.
Code causing the problem in n-e-j, when invoked with `nix-eval-jobs --flake '.#hydraJobs'`:
```
n-e-j/tests/assets » ../../build/src/nix-eval-jobs --meta --workers 1 --flake .#hydraJobs
warning: unknown setting 'trusted-users'
warning: `--gc-roots-dir' not specified
error: unsupported Git input attribute 'dir'
error: worker error: error: unsupported Git input attribute 'dir'
```
```
nix::Value *vRoot = [&]() {
if (args.flake) {
auto [flakeRef, fragment, outputSpec] =
nix::parseFlakeRefWithFragmentAndExtendedOutputsSpec(
args.releaseExpr, nix::absPath("."));
nix::InstallableFlake flake{
{}, state, std::move(flakeRef), fragment, outputSpec,
{}, {}, args.lockFlags};
return flake.toValue(*state).first;
} else {
return releaseExprTopLevelValue(*state, autoArgs, args);
}
}();
```
Inspecting the program behaviour reveals that `dir` was in fact set in the URL going into the fetcher. This is in turn because unlike in the case changed in this commit, it was not erased before handing it to libfetchers, which is probably just a mistake.
```
(rr) up
3 0x00007ffff60262ae in nix::fetchers::Input::fromURL (url=..., requireTree=requireTree@entry=true) at src/libfetchers/fetchers.cc:39
warning: Source file is more recent than executable.
39 auto res = inputScheme->inputFromURL(url, requireTree);
(rr) p url
$1 = (const nix::ParsedURL &) @0x7fffdc874190: {url = "git+file:///home/jade/lix/nix-eval-jobs",
base = "git+file:///home/jade/lix/nix-eval-jobs", scheme = "git+file", authority = std::optional<std::string> = {[contained value] = ""},
path = "/home/jade/lix/nix-eval-jobs", query = std::map with 1 element = {["dir"] = "tests/assets"}, fragment = ""}
(rr) up
4 0x00007ffff789d904 in nix::parseFlakeRefWithFragment (url=".#hydraJobs", baseDir=std::optional<std::string> = {...},
allowMissing=allowMissing@entry=false, isFlake=isFlake@entry=true) at src/libexpr/flake/flakeref.cc:179
warning: Source file is more recent than executable.
179 FlakeRef(Input::fromURL(parsedURL, isFlake), getOr(parsedURL.query, "dir", "")),
(rr) p parsedURL
$2 = {url = "git+file:///home/jade/lix/nix-eval-jobs", base = "git+file:///home/jade/lix/nix-eval-jobs", scheme = "git+file",
authority = std::optional<std::string> = {[contained value] = ""}, path = "/home/jade/lix/nix-eval-jobs", query = std::map with 1 element = {
["dir"] = "tests/assets"}, fragment = ""}
(rr) list
174
175 if (pathExists(flakeRoot + "/.git/shallow"))
176 parsedURL.query.insert_or_assign("shallow", "1");
177
178 return std::make_pair(
179 FlakeRef(Input::fromURL(parsedURL, isFlake), getOr(parsedURL.query, "dir", "")),
180 fragment);
181 }
```
Change-Id: Ib55a882eaeb3e59228857761dc1e3b2e366b0f5e
The original idea was to fix lix#174, but for a user friendly solution,
I figured that we'd need more consistency:
* Invalid query params will cause an error, just like invalid
attributes. This has the following two consequences:
* The `?dir=`-param from flakes will be removed before the URL to be
fetched is passed to libfetchers.
* The tarball fetcher doesn't allow URLs with custom query params
anymore. I think this was questionable anyways given that an
arbitrary set of query params was silently removed from the URL you
wanted to fetch. The correct way is to use an attribute-set
with a key `url` that contains the tarball URL to fetch.
* Same for the git & mercurial fetchers: in that case it doesn't even
matter though: both fetchers added unused query params to the URL
that's passed from the input scheme to the fetcher (`url2` in the code).
It turns out that this was never used since the query parameters were
erased again in `getActualUrl`.
* Validation happens for both attributes and URLs. Previously, a lot of
fetchers validated e.g. refs/revs only when specified in a URL and
the validity of attribute names only in `inputFromAttrs`.
Now, all the validation is done in `inputFromAttrs` and `inputFromURL`
constructs attributes that will be passed to `inputFromAttrs`.
* Accept all attributes as URL query parameters. That also includes
lesser used ones such as `narHash`.
And "output" attributes like `lastModified`: these could be declared
already when declaring inputs as attribute rather than URL. Now the
behavior is at least consistent.
Personally, I think we should differentiate in the future between
"fetched input" (basically the attr-set that ends up in the lock-file)
and "unfetched input" earlier: both inputFrom{Attrs,URL} entrypoints
are probably OK for unfetched inputs, but for locked/fetched inputs
a custom entrypoint should be used. Then, the current entrypoints
wouldn't have to allow these attributes anymore.
Change-Id: I1be1992249f7af8287cfc37891ab505ddaa2e8cd
-- message from cl/1418 --
The boehmgc changes are bundled into this commit because doing otherwise
would require an annoying dance of "adding compatibility for < 8.2.6 and
>= 8.2.6" then updating the pin then removing the (now unneeded)
compatibility. It doesn't seem worth the trouble to me given the low
complexity of said changes.
Rebased coroutine-sp-fallback.diff patch taken from https://github.com/NixOS/nixpkgs/pull/317227
-- jade resubmit changes --
This is a resubmission of https://gerrit.lix.systems/c/lix/+/1418, which
was reverted in https://gerrit.lix.systems/c/lix/+/1432 for breaking CI
evaluation without being detected.
I have run `nix flake check -Lv` on this one before submission and it
passes on my machine and crucially without eval errors, so the CI result
should be accurate.
It seems like someone renamed forbiddenDependenciesRegex to
forbiddenDependenciesRegexes in nixpkgs and also changed the type
incompatibly. That's pretty silly, but at least it's just an eval error.
Also, `xonsh` regressed the availability of `xonsh-unwrapped`, but it
was fixed by us in https://github.com/NixOS/nixpkgs/pull/317636, which
is now in our channel, so we update nixpkgs compared to the original
iteration of this to simply get that.
We originally had a regression related to some reorganization of the
nixpkgs lib test suite in which there was broken parameter passing.
This, too, we got quickfixed in nixpkgs, so we don't need any changes
for it: https://github.com/NixOS/nixpkgs/pull/317772
Related: https://gerrit.lix.systems/c/lix/+/1428
Fixes: lix-project/lix#385
Change-Id: I26d41ea826fec900ebcad0f82a727feb6bcd28f3
4b128008c5d9fde881ce1b0a25e60ae0415a14d5 in nixpkgs introduced a default
hashedPasswordFile for root in NixOS tests, which takes precedence over
the password option set in the nix-copy test.
Change-Id: Iffaebec5992e50614b854033f0d14312c8d275b5
In most real world cases, the Link header is set on the redirect, not on
the final file. This regressed in Lix earlier and while new unit tests
were added to cover it, this integration test should probably have also
caught it.
Change-Id: I2a9d8d952fff36f2c22cfd751451c2b523f7045c
Seccomp filtering and the no-new-privileges functionality improve the security
of the sandbox, and have been enabled by default for a long time. In
lix-project/lix#265 it was decided that they
should be enabled unconditionally. Accordingly, remove the allow-new-privileges
(which had weird behavior anyway) and filter-syscall settings, and force the
security features on. Syscall filtering can still be enabled at build time to
support building on architectures libseccomp doesn't support.
Change-Id: Iedbfa18d720ae557dee07a24f69b2520f30119cb
Fixes#183, #110, #116.
The default flake-registry option becomes 'vendored', and refers
to a vendored flake-registry.json file in the install path.
Vendored copy of the flake-registry is from github:NixOS/flake-registry
at commit 9c69f7bd2363e71fe5cd7f608113290c7614dcdd.
Change-Id: I752b81c85ebeaab4e582ac01c239d69d65580f37
This was found when `logrotate.conf` failed to build in a NixOS system
with:
/nix/store/26zdl4pyw5qazppj8if5lm8bjzxlc07l-coreutils-9.3/bin/id: cannot find name for group ID 30000
This was surprising because it seemed to mean that /etc/group was busted
in the sandbox. Indeed it was:
root❌0:
nixbld:!💯
nogroup❌65534:
We diagnosed this to sandboxUid() being called before
usingUserNamespace() was called, in setting up /etc/group inside the
sandbox. This code desperately needs refactoring.
We also moved the /etc/group code to be with the /etc/passwd code, but
honestly this code is all spaghetti'd all over the place and needs some
more serious tidying than we did here.
We also moved some checks to be earlier to improve locality with where
the things they are checking come from.
Change-Id: Ie29798771f3593c46ec313a32960fa955054aceb
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.
Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:
> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through 9b88e52846/src/libstore/build/local-derivation-goal.cc (L1650-L1663).
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.
Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.
This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
is needed to have the correct syscall number available via
`__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:
$ rg --pcre2 'define __NR_fchmodat2 (?!452)'
sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
58:#define __NR_fchmodat2 1073742276
sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
67:#define __NR_fchmodat2 6452
sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
62:#define __NR_fchmodat2 5452
sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
70:#define __NR_fchmodat2 4452
sysdeps/unix/sysv/linux/alpha/arch-syscall.h
59:#define __NR_fchmodat2 562
I added a small regression-test to the setuid integration-test that
attempts to set the suid bit on a file using the fchmodat2 syscall.
I confirmed that the test fails without the change in
local-derivation-goal.
Additionally, we require libseccomp 2.5.5 or greater now: as it turns
out, libseccomp maintains an internal syscall table and
validates each rule against it. This means that when using libseccomp
2.5.4 or older, one may pass `452` as syscall number against it, but
since it doesn't exist in the internal structure, `libseccomp` will refuse
to create a filter for that. This happens with nixpkgs-23.11, i.e. on
stable NixOS and when building Lix against the project's flake.
To work around that
* a backport of libseccomp 2.5.5 on upstream nixpkgs has been
scheduled[3].
* the package now uses libseccomp 2.5.5 on its own already. This is to
provide a quick fix since the correct fix for 23.11 is still a staging cycle
away.
We still need the compat header though since `SCMP_SYS(fchmodat2)`
internally transforms this into `__SNR_fchmodat2` which points to
`__NR_fchmodat2` from glibc 2.39, so it wouldn't build on glibc 2.38.
The updated syscall table from libseccomp 2.5.5 is NOT used for that
step, but used later, so we need both, our compat header and their
syscall table 🤷
Relevant PRs in CppNix:
* https://github.com/NixOS/nix/pull/10591
* https://github.com/NixOS/nix/pull/10501
[1] https://github.com/NixOS/nixpkgs/issues/300635#issuecomment-2031073804
[2] https://github.com/NixOS/nixpkgs/issues/300635#issuecomment-2030844251
[3] https://github.com/NixOS/nixpkgs/pull/306070
(cherry picked from commit ba6804518772e6afb403dd55478365d4b863c854)
Change-Id: I6921ab5a363188c6bff617750d00bb517276b7fe
This commit adds a new NixOS VM test, which tests that `nix upgrade-nix`
works on both kinds of profiles (manifest.nix and manifest.json).
Done as a separate commit from 831d18a13, since it relies on the
--store-path argument from 026c90e5f as well.
Change-Id: I5fc94b751d252862cb6cffb541a4c072faad9f3b
That's expected by `build-remote` and makes sure that errors are
correctly forwarded to the user. For instance, let's say that the
host-key of `example.org` is unknown and
nix-build ../nixpkgs -A hello -j0 --builders 'ssh-ng://example.org'
is issued, then you get the following output:
cannot build on 'ssh-ng://example.org?&': error: failed to start SSH connection to 'example.org'
Failed to find a machine for remote build!
derivation: yh46gakxq3kchrbihwxvpn5bmadcw90b-hello-2.12.1.drv
required (system, features): (x86_64-linux, [])
2 available machines:
[...]
The relevant information (`Host key verification failed`) ends up in the
daemon's log, but that's not very obvious considering that the daemon
isn't very chatty normally.
This can be fixed - the same way as its done for legacy-ssh - by passing
fd 4 to the SSH wrapper. Now you'd get the following error:
cannot build on 'ssh-ng://example.org': error: failed to start SSH connection to 'example.org': Host key verification failed.
Failed to find a machine for remote build!
[...]
...and now it's clear what's wrong.
Please note that this is won't end up in the derivation's log.
For previous discussion about this change see
https://github.com/NixOS/nix/pull/7659.
Change-Id: I5790856dbf58e53ea3e63238b015ea06c347cf92
In hopes of avoiding opaque error messages like the one in
https://buildbot.lix.systems/#/builders/49/builds/1054/steps/1/logs/stdio
Traceback (most recent call last):
File "/nix/store/wj6wh89jhd2492r781qsr09r9wydfs6m-nixos-test-driver-1.1/bin/.nixos-test-driver-wrapped", line 9, in <module>
sys.exit(main())
^^^^^^
File "/nix/store/wj6wh89jhd2492r781qsr09r9wydfs6m-nixos-test-driver-1.1/lib/python3.11/site-packages/test_driver/__init__.py", line 126, in main
driver.run_tests()
File "/nix/store/wj6wh89jhd2492r781qsr09r9wydfs6m-nixos-test-driver-1.1/lib/python3.11/site-packages/test_driver/driver.py", line 159, in run_tests
self.test_script()
File "/nix/store/wj6wh89jhd2492r781qsr09r9wydfs6m-nixos-test-driver-1.1/lib/python3.11/site-packages/test_driver/driver.py", line 151, in test_script
exec(self.tests, symbols, None)
File "<string>", line 13, in <module>
AssertionError
Change-Id: Idd2212a1c3714ce58c7c3a9f34c2ca4313eb6d55
Saves us a bunch of thinking about how to handle symlinks, and prevents
the DNS config from changing on the fly under the build, which may or may
not be a good thing?
Change-Id: I071e6ae7e220884690b788d94f480866f428db71
The big ones here are `trim-trailing-whitespace` and `end-of-file-fixer`
(which makes sure that every file ends with exactly one newline
character).
Change-Id: Idca73b640883188f068f9903e013cf0d82aa1123
without these changes the tests will very repeatably (although not very
reliably) wedge in our runs. the ssh command starts, opens a sessions,
does something, the session closes again, but the test does not move on.
adding *just* the redirect and not the unit waits is not sufficient
either, it needs both. this feels like a bug in the nixos testing
framework somewhere, but digging that far is not in the cards right now.
Change-Id: Idab577b83a36cc4899bb5ffbb3d9adc04e83e51c
Include phase reporting in log file for ssh-ng builds
(cherry picked from commit b1e7d7cad625095656fff05ac4aedeb12135110a)
Change-Id: I4076669b0ba160412f7c628ca9113f9abbc8c303
It is possible to exfiltrate a file descriptor out of the build sandbox
of FODs, and use it to modify the store path after it has been
registered. To avoid that issue, don't register the output of the build,
but a copy of it (that will be free of any leaked file descriptor).
Test that we can't leverage abstract unix domain sockets to leak file
descriptors out of the sandbox and modify the path after it has been
registered.
(cherry picked from commit 2dadfeb690e7f4b8f97298e29791d202fdba5ca6)
(tests cherry picked from commit c854ae5b3078ac5d99fa75fe148005044809e18c)
Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io>
Co-authored-by: Theophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-authored-by: Tom Bereknyei <tomberek@gmail.com>
Change-Id: I87cd58f1c0a4f7b7a610d354206b33301e47b1a4
Define NixOS tests in `tests/nixos/default.nix` rather than `flake.nix`
(cherry picked from commit c29b8ba142a0650d1182ca838ddc1b2d273dcd2a)
Change-Id: Ieae1b6476d95024485df7067e008013bc5542039
It was disabled in c6953d1ff6 because
a recent Nixpkgs bump brought in a new systemd which changed how
systemd-nspawn worked.
As far as I can tell, the issue was caused by this upstream systemd
commit:
b71a0192c0
Bind-mounting the host's `/sys` and `/proc` into the container's
`/run/host/{sys,proc}` fixes the issue and allows the test to succeed.
(cherry picked from commit 883092e3f78d4efb1066a2e24e343b307035a04c)
https://hydra.nixos.org/build/235888160
This is needed because Nixpkgs now contains dangling symlinks
(pkgs/test/nixpkgs-check-by-name/tests/symlink-invalid/pkgs/by-name/fo/foo/foo.nix).
This is broken because of a change in systemd in NixOS 23.05. It fails
with
Failed to mount proc (type proc) on /proc (MS_NOSUID|MS_NODEV|MS_NOEXEC ""): Operation not permitted
Previously, for tarball flakes, we recorded the original URL of the
tarball flake, rather than the URL to which it ultimately
redirects. Thus, a flake URL like
http://example.org/patchelf-latest.tar that redirects to
http://example.org/patchelf-<revision>.tar was not really usable. We
couldn't record the redirected URL, because sites like GitHub redirect
to CDN URLs that we can't rely on to be stable.
So now we use the redirected URL only if the server returns the
`x-nix-is-immutable` or `x-amz-meta-nix-is-immutable` headers in its
response.