Compare commits

...

12 commits

Author SHA1 Message Date
alois31 2eac435bc7
libstore/build: use an allowlist approach to syscall filtering
Previously, system call filtering (to prevent builders from storing files with
setuid/setgid permission bits or extended attributes) was performed using a
blocklist. While this looks simple at first, it actually carries significant
security and maintainability risks: after all, the kernel may add new syscalls
to achieve the same functionality one is trying to block, and it can even be
hard to actually add the syscall to the blocklist when building against a C
library that doesn't know about it yet. For a recent demonstration of this
happening in practice to Nix, see the introduction of fchmodat2 [0] [1].

The allowlist approach does not share the same drawback. While it does require
a rather large list of harmless syscalls to be maintained in the codebase,
failing to update this list (and roll out the update to all users) in time has
rather benign effects; at worst, very recent programs that already rely on new
syscalls will fail with an error the same way they would on a slightly older
kernel that doesn't support them yet. Most importantly, no unintended new ways
of performing dangerous operations will be silently allowed.

Another possible drawback is reduced system call performance due to the larger
filter created by the allowlist requiring more computation [2]. However, this
issue has not convincingly been demonstrated yet in practice, for example in
systemd or various browsers.

This commit tries to keep the behavior as close to unchanged as possible. Only
newer syscalls that are not supported by glibc 2.38 (as found in NixOS 23.11)
are blocked. Since this includes fchmodat2, the compatibility code added for
handling this syscall can be removed too.

[0] https://github.com/NixOS/nixpkgs/issues/300635
[1] https://github.com/NixOS/nix/issues/10424
[2] https://github.com/flatpak/flatpak/pull/4462#issuecomment-1061690607

Change-Id: I541be3ea9b249bcceddfed6a5a13ac10b11e16ad
2024-06-28 09:40:21 +02:00
alois31 51708e7433
libstore/build: always treat seccomp setup failures as fatal
In f047e4357b, I missed the behavior that if
building without a dedicated build user (i.e. in single-user setups), seccomp
setup failures are silently ignored. This was introduced without explanation 7
years ago (ff6becafa8). Hopefully the only
use-case nowadays is causing spurious test suite successes when messing up the
seccomp filter during development. Let's try removing it.

Change-Id: Ibe51416d9c7a6dd635c2282990224861adf1ceab
2024-06-28 09:40:15 +02:00
jade 5dc85e8b72 Merge "packaging: make pegtl use the __forDefaults mechanism" into main 2024-06-26 22:11:52 +00:00
jade 77c5364596 Merge "doc/hacking: fix up some outdated info about cross, hydra links" into main 2024-06-26 22:11:36 +00:00
eldritch horrors 3dd7d023f4 libmain: don't print empty lines
this most notably affects `nix eval`: if there is no progress bar to be
shown and no activities going on we should not print anything at all. a
progress bar with no activities would print a bunch of terminal escapes
*and a space*, which is not helpful in simple cases like nix eval -E 1.
notably this does *not* affect nix eval called on non-terminal outputs,
but it is slightly confusing nevertheless (and not difficult to avoid).

fixes lix-project/lix#424

Change-Id: Iee793c79ba5a485d6606e0d292ed2eae6dfb7216
2024-06-26 17:44:04 +00:00
jade 9afb0fe41c Merge changes I476a2516,I8a274227 into main
* changes:
  doc/hacking: fix internal api docs section to say to enable it
  doc: Add more about the release note generator
2024-06-26 17:34:45 +00:00
alois31 30da1b17d9 Merge "libmain/progress-bar: move implementation out of the header" into main 2024-06-26 16:05:44 +00:00
jade f7d54cb6b1 packaging: make pegtl use the __forDefaults mechanism
This avoids needing to pass it in when callPackage'ing Lix from external
code.

Change-Id: Ie07e84a151e38614064609a2f6dbff165e193be7
2024-06-26 00:44:46 -07:00
jade 85c1241201 doc/hacking: fix up some outdated info about cross, hydra links
We would like to build these with Hydra but we do not currently have a
Hydra to build them with conveniently.

Change-Id: I0832a33881138dd1caab3805df7ad097db347e62
2024-06-25 21:46:26 -07:00
jade 33d53c4983 doc/hacking: fix internal api docs section to say to enable it
I filed a bug to build these in releng in the future:
lix-project/lix#422

Change-Id: I476a2516cc2be382d4b7c8529a02f9212a78fdb2
2024-06-25 21:26:18 -07:00
jade e537678f1e doc: Add more about the release note generator
Change-Id: I8a274227cb1b05d442d3f644603dd2844ecc9d05
2024-06-25 21:22:37 -07:00
alois31 206a5dbb8f
libmain/progress-bar: move implementation out of the header
Change-Id: Ib4b42ebea290ee575294df6b2f17a38a5d850b80
2024-06-24 20:39:50 +02:00
13 changed files with 659 additions and 150 deletions

View file

@ -3,6 +3,10 @@
#
# It's used for crediting people accurately in release notes. The release notes
# script will link to forgejo, then to GitHub if forgejo is not present.
#
# When adding someone from outside the Lix project, you generally want to simply link their GitHub profile without adding a display name unless they are well-known in the community by that display name.
#
# See doc/manual/src/contributing/hacking.md for more documentation on this file's format and typical usage.
9999years:
display_name: wiggles
forgejo: rbt

View file

@ -168,8 +168,26 @@ or for Nix with the [`flakes`] and [`nix-command`] experimental features enabled
$ nix build .#packages.aarch64-linux.default
```
Cross-compiled builds are available for ARMv6 (`armv6l-linux`) and ARMv7 (`armv7l-linux`).
Add more [system types](#system-type) to `crossSystems` in `flake.nix` to bootstrap Nix on unsupported platforms.
### Cross compiling using the Lix flake
Lix can also be easily cross compiled to the following arbitrarily-chosen system doubles, which can be useful for bootstrapping Lix on new platforms.
These are specified in `crossSystems` in `flake.nix`; feel free to submit changes to add new ones if they are useful to you.
- `armv6l-linux`
- `armv7l-linux`
- `riscv64-linux`
For example, to cross-compile Lix for `armv6l-linux` from another Linux, use the following:
```console
$ nix build .#nix-armv6l-linux
```
It's also possible to cross-compile a tarball of binaries suitable for the Lix installer, for example, for `riscv64-linux`:
```console
$ nix build .#nix-riscv64-linux.passthru.binaryTarball
```
### Building for multiple platforms at once
@ -282,9 +300,8 @@ Regular markdown files used for the manual have a base path of their own and the
## API documentation
Doxygen API documentation is [available
online](https://hydra.nixos.org/job/nix/master/internal-api-docs/latest/download-by-type/doc/internal-api-docs). You
can also build and view it yourself:
Doxygen API documentation will be available online in the future ([tracking issue](https://git.lix.systems/lix-project/lix/issues/422)).
You can also build and view it yourself:
```console
# nix build .#hydraJobs.internal-api-docs
@ -294,44 +311,50 @@ can also build and view it yourself:
or inside a `nix develop` shell by running:
```bash
$ meson configure build -Dinternal-api-docs=enabled
$ meson compile -C build internal-api-docs
$ xdg-open ./outputs/doc/share/doc/nix/internal-api/html/index.html
```
## Coverage analysis
A coverage analysis report is [available
online](https://hydra.nixos.org/job/nix/master/coverage/latest/download-by-type/report/coverage). You
can build it yourself:
A coverage analysis report will be available online in the future (FIXME(lix-hydra)).
You can build it yourself:
```
# nix build .#hydraJobs.coverage
# xdg-open ./result/coverage/index.html
```
Metrics about the change in line/function coverage over time are also
[available](https://hydra.nixos.org/job/nix/master/coverage#tabs-charts).
Metrics about the change in line/function coverage over time will be available in the future (FIXME(lix-hydra)).
## Add a release note
`doc/manual/rl-next` contains release notes entries for all unreleased changes.
User-visible changes should come with a release note.
Developer-facing changes should have a release note in the Development category if they are significant and if developers should know about them.
### Add an entry
Here's what a complete entry looks like. The file name is not incorporated in the document.
Here's what a complete entry looks like.
The file name is not incorporated in the final document, and is generally a super brief summary of the change synopsis.
```
```markdown
---
synopsis: Basically a title
# 1234 or gh#1234 will refer to CppNix GitHub, fj#1234 will refer to a Lix forgejo issue.
issues: [1234, fj#1234]
# Use this *only* if there is a CppNix pull request associated with this change
# Use this *only* if there is a CppNix pull request associated with this change.
prs: 1238
# List of Lix Gerrit changelist numbers; if there is an associated Lix GitHub
# PR, just put in the Gerrit CL number.
# List of Lix Gerrit changelist numbers.
# If there is an associated Lix GitHub PR, just put in the Gerrit CL number.
cls: [123]
# Heading that this release note will appear under.
category: Breaking Changes
# Add a credit mention in the bottom of the release note.
# your-name is used as a key into doc/manual/change-authors.yml for metadata
credits: [your-name]
---
Here's one or more paragraphs that describe the change.
@ -346,6 +369,31 @@ Significant changes should add the following header, which moves them to the top
significance: significant
```
The following categories of release notes are supported (see `maintainers/build-release-notes.py`):
- Breaking Changes
- Features
- Improvements
- Fixes
- Packaging
- Development
- Miscellany
The `credits` field, if present, gives credit to the author of the patch in the release notes with a message like "Many thanks to (your-name) for this" and linking to GitHub or Forgejo profiles if listed.
If you are forward-porting a change from CppNix, please credit the original author, and optionally credit yourself.
When adding credits metadata for people external to the project and deciding whether to put in a `display_name`, consider what they are generally known as in the community; even if you know their full name (e.g. from their GitHub profile), we suggest only adding it as a display name if that is what they go by in the community.
There are multiple reasons we follow this practice, but it boils down to privacy and consent: we would rather not capture full names that are not widely used in the community without the consent of the parties involved, even if they are publicly available.
As of this writing, the entries with full names as `display_name` are either members of the CppNix team or people who added them themselves.
The names specified in `credits` are used as keys to look up the authorship info in `doc/manual/change-authors.yml`.
The only mandatory part is that every key appearing in `credits` has an entry present in `change-authors.yml`.
All of the following properties are optional; you can specify `{}` as the metadata if you want a simple non-hyperlinked mention.
The following properties are supported:
- `display_name`: display name used in place of the key when showing names, if present.
- `forgejo`: Forgejo username. The name in the release notes will be a link to this, if present.
- `github`: GitHub username, used if `forgejo` is not set, again making a link.
### Build process
Releases have a precomputed `rl-MAJOR.MINOR.md`, and no `rl-next.md`.

View file

@ -84,6 +84,8 @@
];
systems = linuxSystems ++ darwinSystems;
# If you add something here, please update the list in doc/manual/src/contributing/hacking.md.
# Thanks~
crossSystems = [
"armv6l-linux"
"armv7l-linux"
@ -164,6 +166,7 @@
nixUnstable = prev.nixUnstable;
check-headers = final.buildPackages.callPackage ./maintainers/check-headers.nix { };
check-syscalls = final.buildPackages.callPackage ./maintainers/check-syscalls.nix { };
default-busybox-sandbox-shell = final.busybox.override {
useMusl = true;
@ -195,16 +198,19 @@
busybox-sandbox-shell = final.busybox-sandbox-shell or final.default-busybox-sandbox-shell;
};
pegtl = final.callPackage ./misc/pegtl.nix { };
pegtl = final.nix.passthru.pegtl;
# Export the patched version of boehmgc that Lix uses into the overlay
# for consumers of this flake.
boehmgc-nix = final.nix.boehmgc-nix;
boehmgc-nix = final.nix.passthru.boehmgc-nix;
# And same thing for our build-release-notes package.
build-release-notes = final.nix.build-release-notes;
build-release-notes = final.nix.passthru.build-release-notes;
};
in
{
# for repl debugging
inherit self;
# A Nixpkgs overlay that overrides the 'nix' and
# 'nix.perl-bindings' packages.
overlays.default = overlayFor (p: p.stdenv);

View file

@ -20,6 +20,8 @@ SIGNIFICANCECES = {
# This is just hardcoded for better validation. If you think there should be
# more of them, feel free to add more.
#
# Please update doc/manual/src/contributing/hacking.md if you do. Thanks~
CATEGORIES = [
'Breaking Changes',
'Features',

View file

@ -0,0 +1,16 @@
{
runCommandNoCC,
lib,
libseccomp,
writeShellScriptBin,
}:
let
syscalls-csv = runCommandNoCC "syscalls.csv" { } ''
echo ${lib.escapeShellArg libseccomp.src}
tar -xf ${lib.escapeShellArg libseccomp.src} --strip-components=2 ${libseccomp.name}/src/syscalls.csv
mv syscalls.csv "$out"
'';
in
writeShellScriptBin "check-syscalls" ''
${./check-syscalls.sh} ${syscalls-csv}
''

7
maintainers/check-syscalls.sh Executable file
View file

@ -0,0 +1,7 @@
#!/usr/bin/env bash
set -e
diff -u <(awk < src/libstore/build/local-derivation-goal.cc '/BEGIN extract-syscalls/ { extracting = 1; next }
match($0, /allowSyscall\(ctx, SCMP_SYS\(([^)]*)\)\);|\/\/ skip ([^ ]*)/, result) { print result[1] result[2] }
/END extract-syscalls/ { extracting = 0; next }') <(tail -n+2 "$1" | cut -d, -f 1)

View file

@ -14,6 +14,7 @@
boost,
brotli,
bzip2,
callPackage,
cmake,
curl,
doxygen,
@ -34,7 +35,7 @@
meson,
ninja,
openssl,
pegtl,
pegtl ? __forDefaults.pegtl,
pkg-config,
python3,
rapidcheck,
@ -75,8 +76,10 @@
configureFlags = prev.configureFlags or [ ] ++ [ (lib.enableFeature true "sigstop") ];
});
lix-doc = pkgs.callPackage ./lix-doc/package.nix { };
build-release-notes = pkgs.callPackage ./maintainers/build-release-notes.nix { };
lix-doc = callPackage ./lix-doc/package.nix { };
build-release-notes = callPackage ./maintainers/build-release-notes.nix { };
pegtl = callPackage ./misc/pegtl.nix { };
},
}:
let
@ -380,7 +383,12 @@ stdenv.mkDerivation (finalAttrs: {
# Export the patched version of boehmgc.
# flake.nix exports that into its overlay.
passthru = {
inherit (__forDefaults) boehmgc-nix editline-lix build-release-notes;
inherit (__forDefaults)
boehmgc-nix
editline-lix
build-release-notes
pegtl
;
inherit officialRelease;
@ -404,6 +412,7 @@ stdenv.mkDerivation (finalAttrs: {
# Lix specific packages
pre-commit-checks,
contribNotice,
check-syscalls,
}:
let
glibcFix = lib.optionalAttrs (buildPlatform.isLinux && glibcLocales != null) {
@ -457,6 +466,7 @@ stdenv.mkDerivation (finalAttrs: {
pythonEnv
# docker image tool
skopeo
check-syscalls
just
nixfmt
# Included above when internalApiDocs is true, but we set that to

View file

@ -13,6 +13,11 @@
namespace nix {
// 100 years ought to be enough for anyone (yet sufficiently smaller than max() to not cause signed integer overflow).
constexpr const auto A_LONG_TIME = std::chrono::duration_cast<std::chrono::milliseconds>(
100 * 365 * std::chrono::seconds(86400)
);
using namespace std::literals::chrono_literals;
static std::string_view getS(const std::vector<Logger::Field> & fields, size_t n)
@ -36,6 +41,21 @@ static std::string_view storePathToName(std::string_view path)
return i == std::string::npos ? base.substr(0, 0) : base.substr(i + 1);
}
ProgressBar::ProgressBar(bool isTTY)
: isTTY(isTTY)
{
state_.lock()->active = isTTY;
updateThread = std::thread([&]() {
auto state(state_.lock());
auto nextWakeup = A_LONG_TIME;
while (state->active) {
if (!state->haveUpdate)
state.wait_for(updateCV, nextWakeup);
nextWakeup = draw(*state, {});
state.wait_for(quitCV, std::chrono::milliseconds(50));
}
});
}
ProgressBar::~ProgressBar()
{
@ -376,7 +396,7 @@ std::chrono::milliseconds ProgressBar::draw(State & state, const std::optional<s
if (printMultiline && moreActivities)
writeToStderr(fmt("And %d more...", moreActivities));
if (!printMultiline) {
if (!printMultiline && !line.empty()) {
line += " " + activity_line;
writeToStderr("\r" + filterANSIEscapes(line, false, width) + ANSI_NORMAL + "\e[K");
}

View file

@ -8,11 +8,6 @@
namespace nix {
// 100 years ought to be enough for anyone (yet sufficiently smaller than max() to not cause signed integer overflow).
constexpr const auto A_LONG_TIME = std::chrono::duration_cast<std::chrono::milliseconds>(
100 * 365 * std::chrono::seconds(86400)
);
struct ProgressBar : public Logger
{
struct ActInfo
@ -68,21 +63,7 @@ struct ProgressBar : public Logger
bool printMultiline = false;
bool isTTY;
ProgressBar(bool isTTY)
: isTTY(isTTY)
{
state_.lock()->active = isTTY;
updateThread = std::thread([&]() {
auto state(state_.lock());
auto nextWakeup = A_LONG_TIME;
while (state->active) {
if (!state->haveUpdate)
state.wait_for(updateCV, nextWakeup);
nextWakeup = draw(*state, {});
state.wait_for(quitCV, std::chrono::milliseconds(50));
}
});
}
ProgressBar(bool isTTY);
~ProgressBar();

View file

@ -44,7 +44,6 @@
#include <sys/prctl.h>
#include <sys/syscall.h>
#if HAVE_SECCOMP
#include "linux/fchmodat2-compat.hh"
#include <seccomp.h>
#endif
#define pivot_root(new_root, put_old) (syscall(SYS_pivot_root, new_root, put_old))
@ -1602,6 +1601,12 @@ void LocalDerivationGoal::chownToBuilder(const Path & path)
throw SysError("cannot change ownership of '%1%'", path);
}
#if HAVE_SECCOMP
void allowSyscall(scmp_filter_ctx ctx, int syscall) {
if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, syscall, 0) != 0)
throw SysError("unable to add seccomp rule");
}
#endif
void setupSeccomp()
{
@ -1609,7 +1614,9 @@ void setupSeccomp()
#if HAVE_SECCOMP
scmp_filter_ctx ctx;
if (!(ctx = seccomp_init(SCMP_ACT_ALLOW)))
// Pretend that syscalls we don't yet know about don't exist.
// This is the best option for compatibility: after all, they did in fact not exist not too long ago.
if (!(ctx = seccomp_init(SCMP_ACT_ERRNO(ENOSYS))))
throw SysError("unable to initialize seccomp mode 2");
Finally cleanup([&]() {
@ -1644,28 +1651,520 @@ void setupSeccomp()
seccomp_arch_add(ctx, SCMP_ARCH_MIPSEL64N32) != 0)
printError("unable to add mips64el-*abin32 seccomp architecture");
/* Prevent builders from creating setuid/setgid binaries. */
for (int perm : { S_ISUID, S_ISGID }) {
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(chmod), 1,
SCMP_A1(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
// This list is intended for machine consumption.
// Please keep its format, order and BEGIN/END markers.
//
// Currently, it is up to date with libseccomp 2.5.5 and glibc 2.38.
// Run check-syscalls to determine which new syscalls should be added.
// New syscalls must be audited and handled in a way that blocks the following dangerous operations:
// * Creation of non-empty setuid/setgid files
// * Creation of extended attributes (including ACLs)
//
// BEGIN extract-syscalls
allowSyscall(ctx, SCMP_SYS(accept));
allowSyscall(ctx, SCMP_SYS(accept4));
allowSyscall(ctx, SCMP_SYS(access));
allowSyscall(ctx, SCMP_SYS(acct));
allowSyscall(ctx, SCMP_SYS(add_key));
allowSyscall(ctx, SCMP_SYS(adjtimex));
allowSyscall(ctx, SCMP_SYS(afs_syscall));
allowSyscall(ctx, SCMP_SYS(alarm));
allowSyscall(ctx, SCMP_SYS(arch_prctl));
allowSyscall(ctx, SCMP_SYS(arm_fadvise64_64));
allowSyscall(ctx, SCMP_SYS(arm_sync_file_range));
allowSyscall(ctx, SCMP_SYS(bdflush));
allowSyscall(ctx, SCMP_SYS(bind));
allowSyscall(ctx, SCMP_SYS(bpf));
allowSyscall(ctx, SCMP_SYS(break));
allowSyscall(ctx, SCMP_SYS(breakpoint));
allowSyscall(ctx, SCMP_SYS(brk));
allowSyscall(ctx, SCMP_SYS(cachectl));
allowSyscall(ctx, SCMP_SYS(cacheflush));
allowSyscall(ctx, SCMP_SYS(cachestat));
allowSyscall(ctx, SCMP_SYS(capget));
allowSyscall(ctx, SCMP_SYS(capset));
allowSyscall(ctx, SCMP_SYS(chdir));
// skip chmod (dangerous)
allowSyscall(ctx, SCMP_SYS(chown));
allowSyscall(ctx, SCMP_SYS(chown32));
allowSyscall(ctx, SCMP_SYS(chroot));
allowSyscall(ctx, SCMP_SYS(clock_adjtime));
allowSyscall(ctx, SCMP_SYS(clock_adjtime64));
allowSyscall(ctx, SCMP_SYS(clock_getres));
allowSyscall(ctx, SCMP_SYS(clock_getres_time64));
allowSyscall(ctx, SCMP_SYS(clock_gettime));
allowSyscall(ctx, SCMP_SYS(clock_gettime64));
allowSyscall(ctx, SCMP_SYS(clock_nanosleep));
allowSyscall(ctx, SCMP_SYS(clock_nanosleep_time64));
allowSyscall(ctx, SCMP_SYS(clock_settime));
allowSyscall(ctx, SCMP_SYS(clock_settime64));
allowSyscall(ctx, SCMP_SYS(clone));
allowSyscall(ctx, SCMP_SYS(clone3));
allowSyscall(ctx, SCMP_SYS(close));
allowSyscall(ctx, SCMP_SYS(close_range));
allowSyscall(ctx, SCMP_SYS(connect));
allowSyscall(ctx, SCMP_SYS(copy_file_range));
allowSyscall(ctx, SCMP_SYS(creat));
allowSyscall(ctx, SCMP_SYS(create_module));
allowSyscall(ctx, SCMP_SYS(delete_module));
allowSyscall(ctx, SCMP_SYS(dup));
allowSyscall(ctx, SCMP_SYS(dup2));
allowSyscall(ctx, SCMP_SYS(dup3));
allowSyscall(ctx, SCMP_SYS(epoll_create));
allowSyscall(ctx, SCMP_SYS(epoll_create1));
allowSyscall(ctx, SCMP_SYS(epoll_ctl));
allowSyscall(ctx, SCMP_SYS(epoll_ctl_old));
allowSyscall(ctx, SCMP_SYS(epoll_pwait));
allowSyscall(ctx, SCMP_SYS(epoll_pwait2));
allowSyscall(ctx, SCMP_SYS(epoll_wait));
allowSyscall(ctx, SCMP_SYS(epoll_wait_old));
allowSyscall(ctx, SCMP_SYS(eventfd));
allowSyscall(ctx, SCMP_SYS(eventfd2));
allowSyscall(ctx, SCMP_SYS(execve));
allowSyscall(ctx, SCMP_SYS(execveat));
allowSyscall(ctx, SCMP_SYS(exit));
allowSyscall(ctx, SCMP_SYS(exit_group));
allowSyscall(ctx, SCMP_SYS(faccessat));
allowSyscall(ctx, SCMP_SYS(faccessat2));
allowSyscall(ctx, SCMP_SYS(fadvise64));
allowSyscall(ctx, SCMP_SYS(fadvise64_64));
allowSyscall(ctx, SCMP_SYS(fallocate));
allowSyscall(ctx, SCMP_SYS(fanotify_init));
allowSyscall(ctx, SCMP_SYS(fanotify_mark));
allowSyscall(ctx, SCMP_SYS(fchdir));
// skip fchmod (dangerous)
// skip fchmodat (dangerous)
// skip fchmodat2 (requires glibc 2.39, dangerous)
allowSyscall(ctx, SCMP_SYS(fchown));
allowSyscall(ctx, SCMP_SYS(fchown32));
allowSyscall(ctx, SCMP_SYS(fchownat));
allowSyscall(ctx, SCMP_SYS(fcntl));
allowSyscall(ctx, SCMP_SYS(fcntl64));
allowSyscall(ctx, SCMP_SYS(fdatasync));
allowSyscall(ctx, SCMP_SYS(fgetxattr));
allowSyscall(ctx, SCMP_SYS(finit_module));
allowSyscall(ctx, SCMP_SYS(flistxattr));
allowSyscall(ctx, SCMP_SYS(flock));
allowSyscall(ctx, SCMP_SYS(fork));
allowSyscall(ctx, SCMP_SYS(fremovexattr));
allowSyscall(ctx, SCMP_SYS(fsconfig));
// skip fsetxattr (dangerous)
allowSyscall(ctx, SCMP_SYS(fsmount));
allowSyscall(ctx, SCMP_SYS(fsopen));
allowSyscall(ctx, SCMP_SYS(fspick));
allowSyscall(ctx, SCMP_SYS(fstat));
allowSyscall(ctx, SCMP_SYS(fstat64));
allowSyscall(ctx, SCMP_SYS(fstatat64));
allowSyscall(ctx, SCMP_SYS(fstatfs));
allowSyscall(ctx, SCMP_SYS(fstatfs64));
allowSyscall(ctx, SCMP_SYS(fsync));
allowSyscall(ctx, SCMP_SYS(ftime));
allowSyscall(ctx, SCMP_SYS(ftruncate));
allowSyscall(ctx, SCMP_SYS(ftruncate64));
allowSyscall(ctx, SCMP_SYS(futex));
// skip futex_requeue (requires glibc 2.39)
allowSyscall(ctx, SCMP_SYS(futex_time64));
// skip futex_wait (requires glibc 2.39)
allowSyscall(ctx, SCMP_SYS(futex_waitv));
// skip futex_wake (requires glibc 2.39)
allowSyscall(ctx, SCMP_SYS(futimesat));
allowSyscall(ctx, SCMP_SYS(getcpu));
allowSyscall(ctx, SCMP_SYS(getcwd));
allowSyscall(ctx, SCMP_SYS(getdents));
allowSyscall(ctx, SCMP_SYS(getdents64));
allowSyscall(ctx, SCMP_SYS(getegid));
allowSyscall(ctx, SCMP_SYS(getegid32));
allowSyscall(ctx, SCMP_SYS(geteuid));
allowSyscall(ctx, SCMP_SYS(geteuid32));
allowSyscall(ctx, SCMP_SYS(getgid));
allowSyscall(ctx, SCMP_SYS(getgid32));
allowSyscall(ctx, SCMP_SYS(getgroups));
allowSyscall(ctx, SCMP_SYS(getgroups32));
allowSyscall(ctx, SCMP_SYS(getitimer));
allowSyscall(ctx, SCMP_SYS(get_kernel_syms));
allowSyscall(ctx, SCMP_SYS(get_mempolicy));
allowSyscall(ctx, SCMP_SYS(getpeername));
allowSyscall(ctx, SCMP_SYS(getpgid));
allowSyscall(ctx, SCMP_SYS(getpgrp));
allowSyscall(ctx, SCMP_SYS(getpid));
allowSyscall(ctx, SCMP_SYS(getpmsg));
allowSyscall(ctx, SCMP_SYS(getppid));
allowSyscall(ctx, SCMP_SYS(getpriority));
allowSyscall(ctx, SCMP_SYS(getrandom));
allowSyscall(ctx, SCMP_SYS(getresgid));
allowSyscall(ctx, SCMP_SYS(getresgid32));
allowSyscall(ctx, SCMP_SYS(getresuid));
allowSyscall(ctx, SCMP_SYS(getresuid32));
allowSyscall(ctx, SCMP_SYS(getrlimit));
allowSyscall(ctx, SCMP_SYS(get_robust_list));
allowSyscall(ctx, SCMP_SYS(getrusage));
allowSyscall(ctx, SCMP_SYS(getsid));
allowSyscall(ctx, SCMP_SYS(getsockname));
allowSyscall(ctx, SCMP_SYS(getsockopt));
allowSyscall(ctx, SCMP_SYS(get_thread_area));
allowSyscall(ctx, SCMP_SYS(gettid));
allowSyscall(ctx, SCMP_SYS(gettimeofday));
allowSyscall(ctx, SCMP_SYS(get_tls));
allowSyscall(ctx, SCMP_SYS(getuid));
allowSyscall(ctx, SCMP_SYS(getuid32));
allowSyscall(ctx, SCMP_SYS(getxattr));
allowSyscall(ctx, SCMP_SYS(gtty));
allowSyscall(ctx, SCMP_SYS(idle));
allowSyscall(ctx, SCMP_SYS(init_module));
allowSyscall(ctx, SCMP_SYS(inotify_add_watch));
allowSyscall(ctx, SCMP_SYS(inotify_init));
allowSyscall(ctx, SCMP_SYS(inotify_init1));
allowSyscall(ctx, SCMP_SYS(inotify_rm_watch));
allowSyscall(ctx, SCMP_SYS(io_cancel));
allowSyscall(ctx, SCMP_SYS(ioctl));
allowSyscall(ctx, SCMP_SYS(io_destroy));
allowSyscall(ctx, SCMP_SYS(io_getevents));
allowSyscall(ctx, SCMP_SYS(ioperm));
allowSyscall(ctx, SCMP_SYS(io_pgetevents));
allowSyscall(ctx, SCMP_SYS(io_pgetevents_time64));
allowSyscall(ctx, SCMP_SYS(iopl));
allowSyscall(ctx, SCMP_SYS(ioprio_get));
allowSyscall(ctx, SCMP_SYS(ioprio_set));
allowSyscall(ctx, SCMP_SYS(io_setup));
allowSyscall(ctx, SCMP_SYS(io_submit));
allowSyscall(ctx, SCMP_SYS(io_uring_enter));
allowSyscall(ctx, SCMP_SYS(io_uring_register));
allowSyscall(ctx, SCMP_SYS(io_uring_setup));
allowSyscall(ctx, SCMP_SYS(ipc));
allowSyscall(ctx, SCMP_SYS(kcmp));
allowSyscall(ctx, SCMP_SYS(kexec_file_load));
allowSyscall(ctx, SCMP_SYS(kexec_load));
allowSyscall(ctx, SCMP_SYS(keyctl));
allowSyscall(ctx, SCMP_SYS(kill));
allowSyscall(ctx, SCMP_SYS(landlock_add_rule));
allowSyscall(ctx, SCMP_SYS(landlock_create_ruleset));
allowSyscall(ctx, SCMP_SYS(landlock_restrict_self));
allowSyscall(ctx, SCMP_SYS(lchown));
allowSyscall(ctx, SCMP_SYS(lchown32));
allowSyscall(ctx, SCMP_SYS(lgetxattr));
allowSyscall(ctx, SCMP_SYS(link));
allowSyscall(ctx, SCMP_SYS(linkat));
allowSyscall(ctx, SCMP_SYS(listen));
allowSyscall(ctx, SCMP_SYS(listxattr));
allowSyscall(ctx, SCMP_SYS(llistxattr));
allowSyscall(ctx, SCMP_SYS(_llseek));
allowSyscall(ctx, SCMP_SYS(lock));
allowSyscall(ctx, SCMP_SYS(lookup_dcookie));
allowSyscall(ctx, SCMP_SYS(lremovexattr));
allowSyscall(ctx, SCMP_SYS(lseek));
// skip lsetxattr (dangerous)
allowSyscall(ctx, SCMP_SYS(lstat));
allowSyscall(ctx, SCMP_SYS(lstat64));
allowSyscall(ctx, SCMP_SYS(madvise));
// skip map_shadow_stack (requires glibc 2.39)
allowSyscall(ctx, SCMP_SYS(mbind));
allowSyscall(ctx, SCMP_SYS(membarrier));
allowSyscall(ctx, SCMP_SYS(memfd_create));
allowSyscall(ctx, SCMP_SYS(memfd_secret));
allowSyscall(ctx, SCMP_SYS(migrate_pages));
allowSyscall(ctx, SCMP_SYS(mincore));
allowSyscall(ctx, SCMP_SYS(mkdir));
allowSyscall(ctx, SCMP_SYS(mkdirat));
allowSyscall(ctx, SCMP_SYS(mknod));
allowSyscall(ctx, SCMP_SYS(mknodat));
allowSyscall(ctx, SCMP_SYS(mlock));
allowSyscall(ctx, SCMP_SYS(mlock2));
allowSyscall(ctx, SCMP_SYS(mlockall));
allowSyscall(ctx, SCMP_SYS(mmap));
allowSyscall(ctx, SCMP_SYS(mmap2));
allowSyscall(ctx, SCMP_SYS(modify_ldt));
allowSyscall(ctx, SCMP_SYS(mount));
allowSyscall(ctx, SCMP_SYS(mount_setattr));
allowSyscall(ctx, SCMP_SYS(move_mount));
allowSyscall(ctx, SCMP_SYS(move_pages));
allowSyscall(ctx, SCMP_SYS(mprotect));
allowSyscall(ctx, SCMP_SYS(mpx));
allowSyscall(ctx, SCMP_SYS(mq_getsetattr));
allowSyscall(ctx, SCMP_SYS(mq_notify));
allowSyscall(ctx, SCMP_SYS(mq_open));
allowSyscall(ctx, SCMP_SYS(mq_timedreceive));
allowSyscall(ctx, SCMP_SYS(mq_timedreceive_time64));
allowSyscall(ctx, SCMP_SYS(mq_timedsend));
allowSyscall(ctx, SCMP_SYS(mq_timedsend_time64));
allowSyscall(ctx, SCMP_SYS(mq_unlink));
allowSyscall(ctx, SCMP_SYS(mremap));
allowSyscall(ctx, SCMP_SYS(msgctl));
allowSyscall(ctx, SCMP_SYS(msgget));
allowSyscall(ctx, SCMP_SYS(msgrcv));
allowSyscall(ctx, SCMP_SYS(msgsnd));
allowSyscall(ctx, SCMP_SYS(msync));
allowSyscall(ctx, SCMP_SYS(multiplexer));
allowSyscall(ctx, SCMP_SYS(munlock));
allowSyscall(ctx, SCMP_SYS(munlockall));
allowSyscall(ctx, SCMP_SYS(munmap));
allowSyscall(ctx, SCMP_SYS(name_to_handle_at));
allowSyscall(ctx, SCMP_SYS(nanosleep));
allowSyscall(ctx, SCMP_SYS(newfstatat));
allowSyscall(ctx, SCMP_SYS(_newselect));
allowSyscall(ctx, SCMP_SYS(nfsservctl));
allowSyscall(ctx, SCMP_SYS(nice));
allowSyscall(ctx, SCMP_SYS(oldfstat));
allowSyscall(ctx, SCMP_SYS(oldlstat));
allowSyscall(ctx, SCMP_SYS(oldolduname));
allowSyscall(ctx, SCMP_SYS(oldstat));
allowSyscall(ctx, SCMP_SYS(olduname));
allowSyscall(ctx, SCMP_SYS(open));
allowSyscall(ctx, SCMP_SYS(openat));
allowSyscall(ctx, SCMP_SYS(openat2));
allowSyscall(ctx, SCMP_SYS(open_by_handle_at));
allowSyscall(ctx, SCMP_SYS(open_tree));
allowSyscall(ctx, SCMP_SYS(pause));
allowSyscall(ctx, SCMP_SYS(pciconfig_iobase));
allowSyscall(ctx, SCMP_SYS(pciconfig_read));
allowSyscall(ctx, SCMP_SYS(pciconfig_write));
allowSyscall(ctx, SCMP_SYS(perf_event_open));
allowSyscall(ctx, SCMP_SYS(personality));
allowSyscall(ctx, SCMP_SYS(pidfd_getfd));
allowSyscall(ctx, SCMP_SYS(pidfd_open));
allowSyscall(ctx, SCMP_SYS(pidfd_send_signal));
allowSyscall(ctx, SCMP_SYS(pipe));
allowSyscall(ctx, SCMP_SYS(pipe2));
allowSyscall(ctx, SCMP_SYS(pivot_root));
allowSyscall(ctx, SCMP_SYS(pkey_alloc));
allowSyscall(ctx, SCMP_SYS(pkey_free));
allowSyscall(ctx, SCMP_SYS(pkey_mprotect));
allowSyscall(ctx, SCMP_SYS(poll));
allowSyscall(ctx, SCMP_SYS(ppoll));
allowSyscall(ctx, SCMP_SYS(ppoll_time64));
allowSyscall(ctx, SCMP_SYS(prctl));
allowSyscall(ctx, SCMP_SYS(pread64));
allowSyscall(ctx, SCMP_SYS(preadv));
allowSyscall(ctx, SCMP_SYS(preadv2));
allowSyscall(ctx, SCMP_SYS(prlimit64));
allowSyscall(ctx, SCMP_SYS(process_madvise));
allowSyscall(ctx, SCMP_SYS(process_mrelease));
allowSyscall(ctx, SCMP_SYS(process_vm_readv));
allowSyscall(ctx, SCMP_SYS(process_vm_writev));
allowSyscall(ctx, SCMP_SYS(prof));
allowSyscall(ctx, SCMP_SYS(profil));
allowSyscall(ctx, SCMP_SYS(pselect6));
allowSyscall(ctx, SCMP_SYS(pselect6_time64));
allowSyscall(ctx, SCMP_SYS(ptrace));
allowSyscall(ctx, SCMP_SYS(putpmsg));
allowSyscall(ctx, SCMP_SYS(pwrite64));
allowSyscall(ctx, SCMP_SYS(pwritev));
allowSyscall(ctx, SCMP_SYS(pwritev2));
allowSyscall(ctx, SCMP_SYS(query_module));
allowSyscall(ctx, SCMP_SYS(quotactl));
allowSyscall(ctx, SCMP_SYS(quotactl_fd));
allowSyscall(ctx, SCMP_SYS(read));
allowSyscall(ctx, SCMP_SYS(readahead));
allowSyscall(ctx, SCMP_SYS(readdir));
allowSyscall(ctx, SCMP_SYS(readlink));
allowSyscall(ctx, SCMP_SYS(readlinkat));
allowSyscall(ctx, SCMP_SYS(readv));
allowSyscall(ctx, SCMP_SYS(reboot));
allowSyscall(ctx, SCMP_SYS(recv));
allowSyscall(ctx, SCMP_SYS(recvfrom));
allowSyscall(ctx, SCMP_SYS(recvmmsg));
allowSyscall(ctx, SCMP_SYS(recvmmsg_time64));
allowSyscall(ctx, SCMP_SYS(recvmsg));
allowSyscall(ctx, SCMP_SYS(remap_file_pages));
allowSyscall(ctx, SCMP_SYS(removexattr));
allowSyscall(ctx, SCMP_SYS(rename));
allowSyscall(ctx, SCMP_SYS(renameat));
allowSyscall(ctx, SCMP_SYS(renameat2));
allowSyscall(ctx, SCMP_SYS(request_key));
allowSyscall(ctx, SCMP_SYS(restart_syscall));
allowSyscall(ctx, SCMP_SYS(riscv_flush_icache));
allowSyscall(ctx, SCMP_SYS(rmdir));
allowSyscall(ctx, SCMP_SYS(rseq));
allowSyscall(ctx, SCMP_SYS(rtas));
allowSyscall(ctx, SCMP_SYS(rt_sigaction));
allowSyscall(ctx, SCMP_SYS(rt_sigpending));
allowSyscall(ctx, SCMP_SYS(rt_sigprocmask));
allowSyscall(ctx, SCMP_SYS(rt_sigqueueinfo));
allowSyscall(ctx, SCMP_SYS(rt_sigreturn));
allowSyscall(ctx, SCMP_SYS(rt_sigsuspend));
allowSyscall(ctx, SCMP_SYS(rt_sigtimedwait));
allowSyscall(ctx, SCMP_SYS(rt_sigtimedwait_time64));
allowSyscall(ctx, SCMP_SYS(rt_tgsigqueueinfo));
allowSyscall(ctx, SCMP_SYS(s390_guarded_storage));
allowSyscall(ctx, SCMP_SYS(s390_pci_mmio_read));
allowSyscall(ctx, SCMP_SYS(s390_pci_mmio_write));
allowSyscall(ctx, SCMP_SYS(s390_runtime_instr));
allowSyscall(ctx, SCMP_SYS(s390_sthyi));
allowSyscall(ctx, SCMP_SYS(sched_getaffinity));
allowSyscall(ctx, SCMP_SYS(sched_getattr));
allowSyscall(ctx, SCMP_SYS(sched_getparam));
allowSyscall(ctx, SCMP_SYS(sched_get_priority_max));
allowSyscall(ctx, SCMP_SYS(sched_get_priority_min));
allowSyscall(ctx, SCMP_SYS(sched_getscheduler));
allowSyscall(ctx, SCMP_SYS(sched_rr_get_interval));
allowSyscall(ctx, SCMP_SYS(sched_rr_get_interval_time64));
allowSyscall(ctx, SCMP_SYS(sched_setaffinity));
allowSyscall(ctx, SCMP_SYS(sched_setattr));
allowSyscall(ctx, SCMP_SYS(sched_setparam));
allowSyscall(ctx, SCMP_SYS(sched_setscheduler));
allowSyscall(ctx, SCMP_SYS(sched_yield));
allowSyscall(ctx, SCMP_SYS(seccomp));
allowSyscall(ctx, SCMP_SYS(security));
allowSyscall(ctx, SCMP_SYS(select));
allowSyscall(ctx, SCMP_SYS(semctl));
allowSyscall(ctx, SCMP_SYS(semget));
allowSyscall(ctx, SCMP_SYS(semop));
allowSyscall(ctx, SCMP_SYS(semtimedop));
allowSyscall(ctx, SCMP_SYS(semtimedop_time64));
allowSyscall(ctx, SCMP_SYS(send));
allowSyscall(ctx, SCMP_SYS(sendfile));
allowSyscall(ctx, SCMP_SYS(sendfile64));
allowSyscall(ctx, SCMP_SYS(sendmmsg));
allowSyscall(ctx, SCMP_SYS(sendmsg));
allowSyscall(ctx, SCMP_SYS(sendto));
allowSyscall(ctx, SCMP_SYS(setdomainname));
allowSyscall(ctx, SCMP_SYS(setfsgid));
allowSyscall(ctx, SCMP_SYS(setfsgid32));
allowSyscall(ctx, SCMP_SYS(setfsuid));
allowSyscall(ctx, SCMP_SYS(setfsuid32));
allowSyscall(ctx, SCMP_SYS(setgid));
allowSyscall(ctx, SCMP_SYS(setgid32));
allowSyscall(ctx, SCMP_SYS(setgroups));
allowSyscall(ctx, SCMP_SYS(setgroups32));
allowSyscall(ctx, SCMP_SYS(sethostname));
allowSyscall(ctx, SCMP_SYS(setitimer));
allowSyscall(ctx, SCMP_SYS(set_mempolicy));
allowSyscall(ctx, SCMP_SYS(set_mempolicy_home_node));
allowSyscall(ctx, SCMP_SYS(setns));
allowSyscall(ctx, SCMP_SYS(setpgid));
allowSyscall(ctx, SCMP_SYS(setpriority));
allowSyscall(ctx, SCMP_SYS(setregid));
allowSyscall(ctx, SCMP_SYS(setregid32));
allowSyscall(ctx, SCMP_SYS(setresgid));
allowSyscall(ctx, SCMP_SYS(setresgid32));
allowSyscall(ctx, SCMP_SYS(setresuid));
allowSyscall(ctx, SCMP_SYS(setresuid32));
allowSyscall(ctx, SCMP_SYS(setreuid));
allowSyscall(ctx, SCMP_SYS(setreuid32));
allowSyscall(ctx, SCMP_SYS(setrlimit));
allowSyscall(ctx, SCMP_SYS(set_robust_list));
allowSyscall(ctx, SCMP_SYS(setsid));
allowSyscall(ctx, SCMP_SYS(setsockopt));
allowSyscall(ctx, SCMP_SYS(set_thread_area));
allowSyscall(ctx, SCMP_SYS(set_tid_address));
allowSyscall(ctx, SCMP_SYS(settimeofday));
allowSyscall(ctx, SCMP_SYS(set_tls));
allowSyscall(ctx, SCMP_SYS(setuid));
allowSyscall(ctx, SCMP_SYS(setuid32));
// skip setxattr (dangerous)
allowSyscall(ctx, SCMP_SYS(sgetmask));
allowSyscall(ctx, SCMP_SYS(shmat));
allowSyscall(ctx, SCMP_SYS(shmctl));
allowSyscall(ctx, SCMP_SYS(shmdt));
allowSyscall(ctx, SCMP_SYS(shmget));
allowSyscall(ctx, SCMP_SYS(shutdown));
allowSyscall(ctx, SCMP_SYS(sigaction));
allowSyscall(ctx, SCMP_SYS(sigaltstack));
allowSyscall(ctx, SCMP_SYS(signal));
allowSyscall(ctx, SCMP_SYS(signalfd));
allowSyscall(ctx, SCMP_SYS(signalfd4));
allowSyscall(ctx, SCMP_SYS(sigpending));
allowSyscall(ctx, SCMP_SYS(sigprocmask));
allowSyscall(ctx, SCMP_SYS(sigreturn));
allowSyscall(ctx, SCMP_SYS(sigsuspend));
allowSyscall(ctx, SCMP_SYS(socket));
allowSyscall(ctx, SCMP_SYS(socketcall));
allowSyscall(ctx, SCMP_SYS(socketpair));
allowSyscall(ctx, SCMP_SYS(splice));
allowSyscall(ctx, SCMP_SYS(spu_create));
allowSyscall(ctx, SCMP_SYS(spu_run));
allowSyscall(ctx, SCMP_SYS(ssetmask));
allowSyscall(ctx, SCMP_SYS(stat));
allowSyscall(ctx, SCMP_SYS(stat64));
allowSyscall(ctx, SCMP_SYS(statfs));
allowSyscall(ctx, SCMP_SYS(statfs64));
allowSyscall(ctx, SCMP_SYS(statx));
allowSyscall(ctx, SCMP_SYS(stime));
allowSyscall(ctx, SCMP_SYS(stty));
allowSyscall(ctx, SCMP_SYS(subpage_prot));
allowSyscall(ctx, SCMP_SYS(swapcontext));
allowSyscall(ctx, SCMP_SYS(swapoff));
allowSyscall(ctx, SCMP_SYS(swapon));
allowSyscall(ctx, SCMP_SYS(switch_endian));
allowSyscall(ctx, SCMP_SYS(symlink));
allowSyscall(ctx, SCMP_SYS(symlinkat));
allowSyscall(ctx, SCMP_SYS(sync));
allowSyscall(ctx, SCMP_SYS(sync_file_range));
allowSyscall(ctx, SCMP_SYS(sync_file_range2));
allowSyscall(ctx, SCMP_SYS(syncfs));
allowSyscall(ctx, SCMP_SYS(syscall));
allowSyscall(ctx, SCMP_SYS(_sysctl));
allowSyscall(ctx, SCMP_SYS(sys_debug_setcontext));
allowSyscall(ctx, SCMP_SYS(sysfs));
allowSyscall(ctx, SCMP_SYS(sysinfo));
allowSyscall(ctx, SCMP_SYS(syslog));
allowSyscall(ctx, SCMP_SYS(sysmips));
allowSyscall(ctx, SCMP_SYS(tee));
allowSyscall(ctx, SCMP_SYS(tgkill));
allowSyscall(ctx, SCMP_SYS(time));
allowSyscall(ctx, SCMP_SYS(timer_create));
allowSyscall(ctx, SCMP_SYS(timer_delete));
allowSyscall(ctx, SCMP_SYS(timerfd));
allowSyscall(ctx, SCMP_SYS(timerfd_create));
allowSyscall(ctx, SCMP_SYS(timerfd_gettime));
allowSyscall(ctx, SCMP_SYS(timerfd_gettime64));
allowSyscall(ctx, SCMP_SYS(timerfd_settime));
allowSyscall(ctx, SCMP_SYS(timerfd_settime64));
allowSyscall(ctx, SCMP_SYS(timer_getoverrun));
allowSyscall(ctx, SCMP_SYS(timer_gettime));
allowSyscall(ctx, SCMP_SYS(timer_gettime64));
allowSyscall(ctx, SCMP_SYS(timer_settime));
allowSyscall(ctx, SCMP_SYS(timer_settime64));
allowSyscall(ctx, SCMP_SYS(times));
allowSyscall(ctx, SCMP_SYS(tkill));
allowSyscall(ctx, SCMP_SYS(truncate));
allowSyscall(ctx, SCMP_SYS(truncate64));
allowSyscall(ctx, SCMP_SYS(tuxcall));
allowSyscall(ctx, SCMP_SYS(ugetrlimit));
allowSyscall(ctx, SCMP_SYS(ulimit));
allowSyscall(ctx, SCMP_SYS(umask));
allowSyscall(ctx, SCMP_SYS(umount));
allowSyscall(ctx, SCMP_SYS(umount2));
allowSyscall(ctx, SCMP_SYS(uname));
allowSyscall(ctx, SCMP_SYS(unlink));
allowSyscall(ctx, SCMP_SYS(unlinkat));
allowSyscall(ctx, SCMP_SYS(unshare));
allowSyscall(ctx, SCMP_SYS(uselib));
allowSyscall(ctx, SCMP_SYS(userfaultfd));
allowSyscall(ctx, SCMP_SYS(usr26));
allowSyscall(ctx, SCMP_SYS(usr32));
allowSyscall(ctx, SCMP_SYS(ustat));
allowSyscall(ctx, SCMP_SYS(utime));
allowSyscall(ctx, SCMP_SYS(utimensat));
allowSyscall(ctx, SCMP_SYS(utimensat_time64));
allowSyscall(ctx, SCMP_SYS(utimes));
allowSyscall(ctx, SCMP_SYS(vfork));
allowSyscall(ctx, SCMP_SYS(vhangup));
allowSyscall(ctx, SCMP_SYS(vm86));
allowSyscall(ctx, SCMP_SYS(vm86old));
allowSyscall(ctx, SCMP_SYS(vmsplice));
allowSyscall(ctx, SCMP_SYS(vserver));
allowSyscall(ctx, SCMP_SYS(wait4));
allowSyscall(ctx, SCMP_SYS(waitid));
allowSyscall(ctx, SCMP_SYS(waitpid));
allowSyscall(ctx, SCMP_SYS(write));
allowSyscall(ctx, SCMP_SYS(writev));
// END extract-syscalls
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmod), 1,
SCMP_A1(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
// chmod family: prevent adding setuid/setgid bits to existing files.
// The Nix store does not support setuid/setgid, and even their temporary creation can weaken the security of the sandbox.
if (seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(chmod), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, S_ISUID | S_ISGID, 0)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(chmod), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, S_ISUID, S_ISUID)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(chmod), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, S_ISGID, S_ISGID)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fchmod), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, S_ISUID | S_ISGID, 0)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmod), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, S_ISUID, S_ISUID)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmod), 1, SCMP_A1(SCMP_CMP_MASKED_EQ, S_ISGID, S_ISGID)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fchmodat), 1, SCMP_A2(SCMP_CMP_MASKED_EQ, S_ISUID | S_ISGID, 0)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmodat), 1, SCMP_A2(SCMP_CMP_MASKED_EQ, S_ISUID, S_ISUID)) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmodat), 1, SCMP_A2(SCMP_CMP_MASKED_EQ, S_ISGID, S_ISGID)) != 0)
throw SysError("unable to add seccomp rule");
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmodat), 1,
SCMP_A2(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), NIX_SYSCALL_FCHMODAT2, 1,
SCMP_A2(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
}
/* Prevent builders from creating EAs or ACLs. Not all filesystems
support these, and they're not allowed in the Nix store because
they're not representable in the NAR serialisation. */
// setxattr family: prevent creation of extended attributes or ACLs.
// Not all filesystems support them, and they're incompatible with the NAR format.
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(ENOTSUP), SCMP_SYS(setxattr), 0) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(ENOTSUP), SCMP_SYS(lsetxattr), 0) != 0 ||
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(ENOTSUP), SCMP_SYS(fsetxattr), 0) != 0)
@ -1699,11 +2198,7 @@ void LocalDerivationGoal::runChild()
commonChildInit();
try {
setupSeccomp();
} catch (...) {
if (buildUser) throw;
}
setupSeccomp();
bool setUser = true;

View file

@ -1,35 +0,0 @@
/*
* Determine the syscall number for `fchmodat2`.
*
* On most platforms this is 452. Exceptions can be found on
* a glibc git checkout via `rg --pcre2 'define __NR_fchmodat2 (?!452)'`.
*
* The problem is that glibc 2.39 and libseccomp 2.5.5 are needed to
* get the syscall number. However, a Lix built against nixpkgs 23.11
* (glibc 2.38) should still have the issue fixed without depending
* on the build environment.
*
* To achieve that, the macros below try to determine the platform and
* set the syscall number which is platform-specific, but
* in most cases 452.
*
* TODO: remove this when 23.11 is EOL and the entire (supported) ecosystem
* is on glibc 2.39.
*/
#pragma once
///@file
#if defined(__alpha__)
# define NIX_SYSCALL_FCHMODAT2 562
#elif defined(__x86_64__) && SIZE_MAX == 0xFFFFFFFF // x32
# define NIX_SYSCALL_FCHMODAT2 1073742276
#elif defined(__mips__) && defined(__mips64) && defined(_ABIN64) // mips64/n64
# define NIX_SYSCALL_FCHMODAT2 5452
#elif defined(__mips__) && defined(__mips64) && defined(_ABIN32) // mips64/n32
# define NIX_SYSCALL_FCHMODAT2 6452
#elif defined(__mips__) && defined(_ABIO32) // mips32
# define NIX_SYSCALL_FCHMODAT2 4452
#else
# define NIX_SYSCALL_FCHMODAT2 452
#endif

View file

@ -1,21 +0,0 @@
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <errno.h>
#include <unistd.h>
#include <assert.h>
int main(void) {
char *name = getenv("out");
FILE *fd = fopen(name, "w");
fprintf(fd, "henlo :3");
fclose(fd);
// FIXME use something nicer here that's less
// platform-dependent as soon as we go to 24.05
// and the glibc is new enough to support fchmodat2
long rs = syscall(452, NULL, name, S_ISUID, 0);
assert(rs == -1);
assert(errno == EPERM);
}

View file

@ -4,17 +4,6 @@
let
pkgs = config.nodes.machine.nixpkgs.pkgs;
fchmodat2-builder = pkgs.runCommandCC "fchmodat2-suid" {
passAsFile = [ "code" ];
code = builtins.readFile ./fchmodat2-suid.c;
# Doesn't work with -O0, shuts up the warning about that.
hardeningDisable = [ "fortify" ];
} ''
mkdir -p $out/bin/
$CC -x c "$codePath" -O0 -g -o $out/bin/fchmodat2-suid
'';
in
{
name = "setuid";
@ -27,26 +16,13 @@ in
virtualisation.additionalPaths = [
pkgs.stdenvNoCC
pkgs.pkgsi686Linux.stdenvNoCC
fchmodat2-builder
];
# need at least 6.6 to test for fchmodat2
boot.kernelPackages = pkgs.linuxKernel.packages.linux_6_6;
};
testScript = { nodes }: ''
# fmt: off
start_all()
with subtest("fchmodat2 suid regression test"):
machine.succeed("""
nix-build -E '(with import <nixpkgs> {}; runCommand "fchmodat2-suid" {
BUILDER = builtins.storePath ${fchmodat2-builder};
} "
exec \\"$BUILDER\\"/bin/fchmodat2-suid
")'
""")
# Copying to /tmp should succeed.
machine.succeed(r"""
nix-build --no-sandbox -E '(with import <nixpkgs> {}; runCommand "foo" {} "