Build directories left-overs in $TMPDIR after build failure/user interruption #678

Closed
opened 2025-02-16 14:42:31 +00:00 by raito · 11 comments
Owner

Describe the bug

Since Lix 2.91.1 (at least, could happen on older versions), Lix leave some build directories of built derivations in $TMPDIR instead of cleaning them up.

Steps To Reproduce

Building a large graph of derivations seems to exacerbate this.

Expected behavior

Build directories should be cleaned up.

nix --version output

Confirmed to happen:

  • nix (Lix, like Nix) 2.93.0-dev-pre20250206-1a13827
  • 2.91.1

Additional context

Example:

drwx------     - nixbld11  6 févr. 01:27 nix-build-etc-fstab.drv-2
drwx------     - nixbld15  6 févr. 01:27 nix-build-etc-hostname.drv-10
drwx------     - nixbld17  6 févr. 01:27 nix-build-etc-modprobe.d-nixos.conf.drv-2
drwx------     - nixbld10  6 févr. 01:27 nix-build-etc-msmtprc.drv-0
drwx------     - nixbld24  6 févr. 01:27 nix-build-etc-os-release.drv-3
drwx------     - nixbld21  6 févr. 01:27 nix-build-etc-pam-environment.drv-0
drwx------     - nixbld23  6 févr. 01:27 nix-build-etc-ssh-ssh_known_hosts.drv-0
drwx------     - nixbld29  6 févr. 01:27 nix-build-issue.drv-1
drwx------     - nixbld30  6 févr. 01:27 nix-build-limits.conf.drv-0
drwx------     - nixbld21 12 févr. 00:55 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-0
drwx------     - nixbld16 12 févr. 19:47 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-1
drwx------     - nixbld10 12 févr. 19:52 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-2
drwx------     - nixbld1  12 févr. 19:56 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-3
drwx------     - nixbld22  6 févr. 01:27 nix-build-locale.conf.drv-0
drwx------     - nixbld18  6 févr. 01:27 nix-build-nixos.conf.drv-0
drwx------     - nixbld13  6 févr. 01:27 nix-build-raito-authorized_keys.drv-0
drwx------     - nixbld25  6 févr. 01:27 nix-build-root-authorized_keys.drv-1
drwx------     - nixbld20  6 févr. 01:27 nix-build-unit-10-enp35s0f0np0.network.drv-0
drwx------     - nixbld26  6 févr. 01:27 nix-build-unit-50-netbird-wt0.network.drv-0
drwx------     - nixbld12  6 févr. 01:27 nix-build-unit-99-ethernet-default-dhcp.network.drv-0
drwx------     - nixbld28  6 févr. 01:27 nix-build-unit-99-wireless-client-dhcp.network.drv-0
drwx------     - nixbld27  6 févr. 01:27 nix-build-useradd.drv-0
drwx------     - nixbld1   6 févr. 01:27 nix-build-users-groups.json.drv-4
drwx------     - nixbld19  6 févr. 01:27 nix-build-vconsole.conf.drv-0

cc @mweinelt

## Describe the bug Since Lix 2.91.1 (at least, could happen on older versions), Lix leave some build directories of built derivations in `$TMPDIR` instead of cleaning them up. ## Steps To Reproduce Building a large graph of derivations seems to exacerbate this. ## Expected behavior Build directories should be cleaned up. ## `nix --version` output Confirmed to happen: - nix (Lix, like Nix) 2.93.0-dev-pre20250206-1a13827 - 2.91.1 ## Additional context Example: ``` drwx------ - nixbld11 6 févr. 01:27 nix-build-etc-fstab.drv-2 drwx------ - nixbld15 6 févr. 01:27 nix-build-etc-hostname.drv-10 drwx------ - nixbld17 6 févr. 01:27 nix-build-etc-modprobe.d-nixos.conf.drv-2 drwx------ - nixbld10 6 févr. 01:27 nix-build-etc-msmtprc.drv-0 drwx------ - nixbld24 6 févr. 01:27 nix-build-etc-os-release.drv-3 drwx------ - nixbld21 6 févr. 01:27 nix-build-etc-pam-environment.drv-0 drwx------ - nixbld23 6 févr. 01:27 nix-build-etc-ssh-ssh_known_hosts.drv-0 drwx------ - nixbld29 6 févr. 01:27 nix-build-issue.drv-1 drwx------ - nixbld30 6 févr. 01:27 nix-build-limits.conf.drv-0 drwx------ - nixbld21 12 févr. 00:55 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-0 drwx------ - nixbld16 12 févr. 19:47 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-1 drwx------ - nixbld10 12 févr. 19:52 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-2 drwx------ - nixbld1 12 févr. 19:56 nix-build-lix-2.92.0-devpre20241120_66f6dbd.drv-3 drwx------ - nixbld22 6 févr. 01:27 nix-build-locale.conf.drv-0 drwx------ - nixbld18 6 févr. 01:27 nix-build-nixos.conf.drv-0 drwx------ - nixbld13 6 févr. 01:27 nix-build-raito-authorized_keys.drv-0 drwx------ - nixbld25 6 févr. 01:27 nix-build-root-authorized_keys.drv-1 drwx------ - nixbld20 6 févr. 01:27 nix-build-unit-10-enp35s0f0np0.network.drv-0 drwx------ - nixbld26 6 févr. 01:27 nix-build-unit-50-netbird-wt0.network.drv-0 drwx------ - nixbld12 6 févr. 01:27 nix-build-unit-99-ethernet-default-dhcp.network.drv-0 drwx------ - nixbld28 6 févr. 01:27 nix-build-unit-99-wireless-client-dhcp.network.drv-0 drwx------ - nixbld27 6 févr. 01:27 nix-build-useradd.drv-0 drwx------ - nixbld1 6 févr. 01:27 nix-build-users-groups.json.drv-4 drwx------ - nixbld19 6 févr. 01:27 nix-build-vconsole.conf.drv-0 ``` cc @mweinelt
Member

I have no crashes logged, so this feels like an issue with some control flow.

I have no crashes logged, so this feels like an issue with some control flow.
Member

One scenario that leaves a build dir behind is when aborting a build using CTRL_C.

nix (Lix, like Nix) 2.93.0-dev-pre20250206-1a13827

One scenario that leaves a build dir behind is when aborting a build using CTRL_C. nix (Lix, like Nix) 2.93.0-dev-pre20250206-1a13827
Owner

I know there's also the "directory not empty" error message when deleting some build directories sometimes. Unsure of the actual cause of that one either.

I know there's also the "directory not empty" error message when deleting some build directories sometimes. Unsure of the actual cause of that one either.

Just had this with lix 2.91.0:

error (ignored): error: cannot unlink '/tmp/nix-build-x86_64-unknown-linux-gnu-ghc-native-bignum-9.8.4.drv-1/ghc-9.8.4-source/_build/stage1': Directory not empty

No CTRL+C, regular build failure.

Looking at the left-over files, it seems like all source code etc. has been removed and only some build artifacts remain. I was running this with --cores 32, so the build itself was parallelized.

Could it be that some processes, of in this case GHC, were still producing files, while lix was already trying to shutdown and start removing things?

Edit: It seems to happen frequently... I have collected 91 GB of left-overs from nix (lix) in /tmp right now :D

Just had this with lix 2.91.0: ``` error (ignored): error: cannot unlink '/tmp/nix-build-x86_64-unknown-linux-gnu-ghc-native-bignum-9.8.4.drv-1/ghc-9.8.4-source/_build/stage1': Directory not empty ``` No CTRL+C, regular build failure. Looking at the left-over files, it seems like all source code etc. has been removed and only *some* build artifacts remain. I was running this with `--cores 32`, so the build itself was parallelized. Could it be that some processes, of in this case GHC, were still producing files, while lix was already trying to shutdown and start removing things? Edit: It seems to happen frequently... I have collected 91 GB of left-overs from nix (lix) in /tmp right now :D
Member

The workaround that I'm currently relying on is this:

  systemd.services.prune-stale-nix-builds = {
    description = "Prune stale nix build roots";
    startAt = "hourly";
    unitConfig.Documentation = "https://github.com/NixOS/nix/issues/5207";
    serviceConfig = {
      ExecStart = lib.concatStringsSep " " [
        (lib.getExe pkgs.findutils)
        "/tmp"
        "-maxdepth 1"
        "-type d"
        "-iname \"nix-build-*\""
        "-mtime +1" # days
        "-exec rm -rf {} +"
      ];
    };
  };

Nix issue is https://github.com/NixOS/nix/issues/5207

The workaround that I'm currently relying on is this: ```nix systemd.services.prune-stale-nix-builds = { description = "Prune stale nix build roots"; startAt = "hourly"; unitConfig.Documentation = "https://github.com/NixOS/nix/issues/5207"; serviceConfig = { ExecStart = lib.concatStringsSep " " [ (lib.getExe pkgs.findutils) "/tmp" "-maxdepth 1" "-type d" "-iname \"nix-build-*\"" "-mtime +1" # days "-exec rm -rf {} +" ]; }; }; ``` Nix issue is https://github.com/NixOS/nix/issues/5207
Owner

reproducer:

{ uuid }:
with import <nixpkgs> {};

let
  do = name: script: derivation {
    name = name + "-${uuid}";
    system = __currentSystem;
    builder = runtimeShell;
    args = [ "-c" "source ${__toFile "script" "PATH=$PATH_:$PATH; ${__unsafeDiscardStringContext script}"}" ];
    PATH_ = "${coreutils}/bin";
  };
in
symlinkJoin {
  name = "sl";
  paths =
    __genList (n: do "foo${toString n}" "touch `seq 1000`; touch $out") 100
    ++ [
      (do "foo00" "false")
    ];
}

likely cause: first line of block should be the last line instead

try { deleteTmpDir(false); } catch (...) { ignoreExceptionInDestructor(); }
try { killChild(); } catch (...) { ignoreExceptionInDestructor(); }
try { stopDaemon(); } catch (...) { ignoreExceptionInDestructor(); }

reproducer: ``` { uuid }: with import <nixpkgs> {}; let do = name: script: derivation { name = name + "-${uuid}"; system = __currentSystem; builder = runtimeShell; args = [ "-c" "source ${__toFile "script" "PATH=$PATH_:$PATH; ${__unsafeDiscardStringContext script}"}" ]; PATH_ = "${coreutils}/bin"; }; in symlinkJoin { name = "sl"; paths = __genList (n: do "foo${toString n}" "touch `seq 1000`; touch $out") 100 ++ [ (do "foo00" "false") ]; } ``` likely cause: first line of block should be the last line instead https://git.lix.systems/lix-project/lix/src/commit/82c7e76c9c2d829dc11c22b32173a40056cc44ef/lix/libstore/build/local-derivation-goal.cc#L107-L109
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/2639 ("libstore: kill sandboxes before removing paths")
  • commit message in cl/2666 ("libstore: always delete tmpdirs during derivation goal destruction")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/2639", "number": 2639, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/2666", "number": 2666, "kind": "commit message"}], "cl_meta": {"2639": {"change_title": "libstore: kill sandboxes before removing paths"}, "2666": {"change_title": "libstore: *always* delete tmpdirs during derivation goal destruction"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/2639](https://gerrit.lix.systems/c/lix/+/2639) ("libstore: kill sandboxes before removing paths") * commit message in [cl/2666](https://gerrit.lix.systems/c/lix/+/2666) ("libstore: *always* delete tmpdirs during derivation goal destruction")
mweinelt reopened this issue 2025-02-23 17:30:32 +00:00
Member

Still reproduces, when interrupting the build with CTRL_C.

Feb 23 18:17:04 gaia nix-daemon[834310]: accepted connection from pid 859585, user hexa (trusted)
Feb 23 18:17:11 gaia nix-daemon[859601]: unexpected Nix daemon error: error: interrupted by the user
Still reproduces, when interrupting the build with CTRL_C. ``` Feb 23 18:17:04 gaia nix-daemon[834310]: accepted connection from pid 859585, user hexa (trusted) Feb 23 18:17:11 gaia nix-daemon[859601]: unexpected Nix daemon error: error: interrupted by the user ```
pennae changed title from Build directories left-overs in $TMPDIR to Build directories left-overs in $TMPDIR after build failure/user interruption 2025-02-23 17:31:39 +00:00
Owner

static void _deletePath(int parentfd, const Path & path, uint64_t & bytesFreed)
{
checkInterrupt();

for fuck's sake.

https://git.lix.systems/lix-project/lix/src/commit/ee49ed56c36d4ad1293c9a3abf3487d94be9aabc/lix/libutil/file-system.cc#L412-L414 for fuck's sake.
Member

Deletion on CTRL_C works with https://gerrit.lix.systems/c/lix/+/2666. Thank you!

Deletion on CTRL_C works with https://gerrit.lix.systems/c/lix/+/2666. Thank you!
Owner

cl merged. claws crossed we won't need a round 3

cl merged. claws crossed we won't need a round 3
Sign in to join this conversation.
No milestone
No project
No assignees
6 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#678
No description provided.