Unsandboxes builds can get stuck if the build starts children unparented to the sandbox #1018

Closed
opened 2025-10-20 19:56:41 +00:00 by raito · 3 comments
Owner

Describe the bug

Unsandboxed (direct store access) builds, e.g. container builds, can get stuck if they start things that are not parented to the sandbox at the end of their build.

Steps To Reproduce

  1. Clone https://codeberg.org/Henry-Hiles/chorus/src/branch/dev
  2. sudo -E NIX_REMOTE=local nix build --no-sandbox .#checks.x86_64-linux.nextest --builders ''
  3. See [1/0/1 built, 0.0 MiB DL] building chorus-nextest-0.20.0 (checkPhase): error: test run failed 80/80: 0 running, 77 passed, 3 failed, 0 skipped being stuck.
  4. Grep for processes and discover a defunct bash along with a Node.js long running server.

Expected behavior

The build should complete.

nix --version output

Tested over nix (Lix, like Nix) 2.94.020251018151901-raito-edition and Lix 2.93.3.

Additional context

## Describe the bug Unsandboxed (direct store access) builds, e.g. container builds, can get stuck if they start things that are not parented to the sandbox at the end of their build. ## Steps To Reproduce 1. Clone https://codeberg.org/Henry-Hiles/chorus/src/branch/dev 2. `sudo -E NIX_REMOTE=local nix build --no-sandbox .#checks.x86_64-linux.nextest --builders ''` 3. See `[1/0/1 built, 0.0 MiB DL] building chorus-nextest-0.20.0 (checkPhase): error: test run failed 80/80: 0 running, 77 passed, 3 failed, 0 skipped` being stuck. 4. Grep for processes and discover a defunct bash along with a Node.js long running server. ## Expected behavior The build should complete. ## `nix --version` output Tested over `nix (Lix, like Nix) 2.94.020251018151901-raito-edition` and Lix 2.93.3. ## Additional context
raito changed title from Unsandboxed (direct store access) builds can have their child corrupted upon exit to Unsandboxes builds can get stuck if the build starts children unparented to the sandbox 2025-10-20 20:00:21 +00:00
Owner

we've managed to reproduce this, and it looks like a failure in the build system being run inside the sandbox. the node process we see hanging around hangs around because it was previously orphaned and reparented to init, which in the sandboxed case works out fine when sandbox init (ie the stdenv bash script) exits and has the kernel kill all remaining processes in the pid namespace. in unsandboxed cases we can only reliably clean this up with cgroups.

we've managed to reproduce this, and it looks like a failure in the build system being run inside the sandbox. the node process we see hanging around hangs around because it was previously orphaned and reparented to init, which in the sandboxed case works out fine when sandbox init (ie the stdenv bash script) exits and has the kernel kill all remaining processes in the pid namespace. in unsandboxed cases we can only reliably clean this up with cgroups.
Author
Owner

Given the fundamental issue here, solving this problem is not a release blocker for 2.94.0, the task is to make documentation about this problem available and explain what are workarounds that expression authors can use to fix their builds.

Given the fundamental issue here, solving this problem is not a release blocker for 2.94.0, the task is to make documentation about this problem available and explain what are workarounds that expression authors can use to fix their builds.
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/4548 ("doc/manual/known-issues: init")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/4548", "number": 4548, "kind": "commit message"}], "cl_meta": {"4548": {"change_title": "doc/manual/known-issues: init"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/4548](https://gerrit.lix.systems/c/lix/+/4548) ("doc/manual/known-issues: init")
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#1018
No description provided.