nix-daemon breaks mysteriously when it runs out of disk space #429

Open
opened 2024-07-03 03:28:27 +00:00 by danderson · 1 comment

Describe the bug

A rebuild was downloading a bunch of derivations, and filled my desktop's ZFS pool (0B free according to zfs list). The daemon's reaction to this was mysterious: the derivation downloads stalled down to 0B/s and stayed there, without reporting any issue or failing the build.

Meanwhile, in other terminals, all nix commands that require talking to nix-daemon failed with a variety of mysterious daemon connection failures:

> nix build
error:
       … while fetching the input 'git+file:///home/dave/hack/homelab'

       error: cannot open connection to remote store 'daemon': error: reading from file: Connection reset by peer
> nix shell nixos-unstable#kicad
error:
       … while fetching the input 'path:/nix/store/j4jzjbr302cw5bl0n3pch5j9bh5qwmaj-source?lastModified=1719848872&narHash=sha256-H3%2BEC5cYuq%2BgQW8y0lSrrDZfH71LB4DAf%2BTDFyvwCNA%3D&rev=00d80d13810dbfea8ab4ed1009b09100cca86ba8'

       error: opening a connection to remote store 'daemon' previously failed

I freed up some space to get an amount of free bytes again, and commands started working normally again.

My first attempt to repro was by setting a quota on /nix only, with zfs set quota=127G data/nix and fetching a big derivation to push it across the finish line, but that failed with an obvious hint:

> nix shell nixos-unstable#kicad
error: writing to file: Disk quota exceeded
error: some substitutes for the outputs of derivation '/nix/store/czkh57kpm2rj6zwsalq94ralz6j6d6gi-kicad-packages3d-3172a1cc09.drv' failed (usually happens due to networking issues); try '--fallback' to build derivation from source
error: 1 dependencies of derivation '/nix/store/yramhzg57253iw446r3ax71sy8ar4ypk-kicad-8.0.3.drv' failed to build

After that I repro'd more violently by just filling the zpool with /dev/urandom, and got the mysterious lockup again. During the two "really out of bytes" episodes, nix-daemon logs say:

Jul 03 02:49:33 vega nix-daemon[2338]: accepted connection from pid 1378082, user dave (trusted)
Jul 03 02:49:33 vega nix-daemon[1378092]: unexpected Nix daemon error: error: could not set permissions on '/nix/var/nix/profiles/per-user' to 755: No space left on device

once for each command I attempted to run. Similar error when I broke /nix via quotas:

Jul 03 03:05:03 vega nix-daemon[1559315]: accepted connection from pid 1559379, user dave (trusted)
Jul 03 03:05:03 vega nix-daemon[1559382]: unexpected Nix daemon error: error: could not set permissions on '/nix/var/nix/profiles/per-user' to 755: Disk quota exceeded

Steps To Reproduce

  1. Have a modestly sized SSD, with a nixos install on ZFS
  2. Have a bit of a hoarding problem and burn a lot of storage on large git clones and the like
  3. Try to nix-build a system config with not enough space left for all the stuff that gets faulted in from cache
  4. Observe cache downloads stall, nix build and nix shell commands in other terminals fail with a mysterious error
  5. Free up some storage, observe recovery

I'm able to trigger this on demand, if there's any verbose logspam I can enable or any evidence you'd like, given it's a mildly esoteric setup.

Expected behavior

It'd be nice to get a more explicit error that points me at the full disk, if possible.

nix --version output

nix (Lix, like Nix) 2.91.0-dev-pre20240702-45ac449

## Describe the bug A rebuild was downloading a bunch of derivations, and filled my desktop's ZFS pool (0B free according to `zfs list`). The daemon's reaction to this was mysterious: the derivation downloads stalled down to 0B/s and stayed there, without reporting any issue or failing the build. Meanwhile, in other terminals, all nix commands that require talking to nix-daemon failed with a variety of mysterious daemon connection failures: ``` > nix build error: … while fetching the input 'git+file:///home/dave/hack/homelab' error: cannot open connection to remote store 'daemon': error: reading from file: Connection reset by peer ``` ``` > nix shell nixos-unstable#kicad error: … while fetching the input 'path:/nix/store/j4jzjbr302cw5bl0n3pch5j9bh5qwmaj-source?lastModified=1719848872&narHash=sha256-H3%2BEC5cYuq%2BgQW8y0lSrrDZfH71LB4DAf%2BTDFyvwCNA%3D&rev=00d80d13810dbfea8ab4ed1009b09100cca86ba8' error: opening a connection to remote store 'daemon' previously failed ``` I freed up some space to get an amount of free bytes again, and commands started working normally again. My first attempt to repro was by setting a quota on /nix only, with `zfs set quota=127G data/nix` and fetching a big derivation to push it across the finish line, but that failed with an obvious hint: ``` > nix shell nixos-unstable#kicad error: writing to file: Disk quota exceeded error: some substitutes for the outputs of derivation '/nix/store/czkh57kpm2rj6zwsalq94ralz6j6d6gi-kicad-packages3d-3172a1cc09.drv' failed (usually happens due to networking issues); try '--fallback' to build derivation from source error: 1 dependencies of derivation '/nix/store/yramhzg57253iw446r3ax71sy8ar4ypk-kicad-8.0.3.drv' failed to build ``` After that I repro'd more violently by just filling the zpool with /dev/urandom, and got the mysterious lockup again. During the two "really out of bytes" episodes, nix-daemon logs say: ``` Jul 03 02:49:33 vega nix-daemon[2338]: accepted connection from pid 1378082, user dave (trusted) Jul 03 02:49:33 vega nix-daemon[1378092]: unexpected Nix daemon error: error: could not set permissions on '/nix/var/nix/profiles/per-user' to 755: No space left on device ``` once for each command I attempted to run. Similar error when I broke /nix via quotas: ``` Jul 03 03:05:03 vega nix-daemon[1559315]: accepted connection from pid 1559379, user dave (trusted) Jul 03 03:05:03 vega nix-daemon[1559382]: unexpected Nix daemon error: error: could not set permissions on '/nix/var/nix/profiles/per-user' to 755: Disk quota exceeded ``` ## Steps To Reproduce 1. Have a modestly sized SSD, with a nixos install on ZFS 1. Have a bit of a hoarding problem and burn a lot of storage on large git clones and the like 1. Try to nix-build a system config with not enough space left for all the stuff that gets faulted in from cache 1. Observe cache downloads stall, `nix build` and `nix shell` commands in other terminals fail with a mysterious error 1. Free up some storage, observe recovery I'm able to trigger this on demand, if there's any verbose logspam I can enable or any evidence you'd like, given it's a mildly esoteric setup. ## Expected behavior It'd be nice to get a more explicit error that points me at the full disk, if possible. ## nix --version output `nix (Lix, like Nix) 2.91.0-dev-pre20240702-45ac449`
danderson added the
bug
label 2024-07-03 03:28:27 +00:00
Owner

oh yes, that really should be much, much clearer. thanks very much for reporting it.

oh yes, that really should be much, much clearer. thanks very much for reporting it.
pennae added the
ux
label 2024-07-03 11:38:29 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#429
No description provided.