Unsandboxes builds can get stuck if the build starts children unparented to the sandbox #1018
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#1018
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
Unsandboxed (direct store access) builds, e.g. container builds, can get stuck if they start things that are not parented to the sandbox at the end of their build.
Steps To Reproduce
sudo -E NIX_REMOTE=local nix build --no-sandbox .#checks.x86_64-linux.nextest --builders ''[1/0/1 built, 0.0 MiB DL] building chorus-nextest-0.20.0 (checkPhase): error: test run failed 80/80: 0 running, 77 passed, 3 failed, 0 skippedbeing stuck.Expected behavior
The build should complete.
nix --versionoutputTested over
nix (Lix, like Nix) 2.94.020251018151901-raito-editionand Lix 2.93.3.Additional context
Unsandboxed (direct store access) builds can have their child corrupted upon exitto Unsandboxes builds can get stuck if the build starts children unparented to the sandboxwe've managed to reproduce this, and it looks like a failure in the build system being run inside the sandbox. the node process we see hanging around hangs around because it was previously orphaned and reparented to init, which in the sandboxed case works out fine when sandbox init (ie the stdenv bash script) exits and has the kernel kill all remaining processes in the pid namespace. in unsandboxed cases we can only reliably clean this up with cgroups.
Given the fundamental issue here, solving this problem is not a release blocker for 2.94.0, the task is to make documentation about this problem available and explain what are workarounds that expression authors can use to fix their builds.
This issue was mentioned on Gerrit on the following CLs: