Default build directory change worsens Unix socket path length limit on Darwin #913

Open
opened 2025-07-11 22:27:40 +00:00 by emilazy · 15 comments
Member

Darwin has a small maximum path length for Unix sockets and /nix/var/nix/builds is longer than /tmp. (Specifically it is 15 longer.)

The limit is not much bigger on Linux I think, but filesystem namespacing hides the difference there.

Empirically, this has resulted in two people independently reporting failing builds as a result in the macOS room, and Jujutsu’s CI starting to fail after they picked up the Nix backport of the CVE fixes. So although there is obviously no general‐purpose solution for an arbitrary limit, /tmp was significantly more reliable at not running into the limit.

My proposal is to regain the previous margin by removing the now‐redundant nix-build- prefix from the directory names (10 bytes) and the I‐think‐redundant .drv (4 bytes) and, uh, maybe we can find another byte hiding under the sofa somewhere. The numbers after the derivation name seem like they could probably be shorter.

Note that the previous status quo was not perfect; Jujutsu had to shorten some paths in tests when making their derivation name longer: https://github.com/jj-vcs/jj/pull/6499. /nix/var/nix/builds/nix-build-jujutsu-0.31.0-unstable-1d98834.drv-4695-2529470193 is a pretty long prefix. I’m open to solutions that take the out of the equation here, but I didn’t like the idea of renaming after --keep-failed because it could interfere with reproducing failures because of exactly this kind of issue. Maybe it would be fine to just do /nix/var/nix/builds/jujutsu-⟨base32 of derivation hash + stuff for uniqueness⟩ and truncate the name if it’s too long or something. Or just omit it entirely and let people suffer a little if they want to look at failed builds.

cc @raito @jade

Darwin has a small maximum path length for Unix sockets and `/nix/var/nix/builds` is longer than `/tmp`. (Specifically it is 15 longer.) The limit is not much bigger on Linux I think, but filesystem namespacing hides the difference there. Empirically, this has resulted in two people independently reporting failing builds as a result in the macOS room, and Jujutsu’s CI starting to fail after they picked up the Nix backport of the CVE fixes. So although there is obviously no general‐purpose solution for an arbitrary limit, `/tmp` was significantly more reliable at not running into the limit. My proposal is to regain the previous margin by removing the now‐redundant `nix-build-` prefix from the directory names (10 bytes) and the I‐think‐redundant `.drv` (4 bytes) and, uh, maybe we can find another byte hiding under the sofa somewhere. The numbers after the derivation name seem like they could probably be shorter. Note that the previous status quo was not perfect; Jujutsu had to shorten some paths in tests when making their derivation name longer: https://github.com/jj-vcs/jj/pull/6499. `/nix/var/nix/builds/nix-build-jujutsu-0.31.0-unstable-1d98834.drv-4695-2529470193` is a pretty long prefix. I’m open to solutions that take the out of the equation here, but I didn’t like the idea of renaming after `--keep-failed` because it could interfere with reproducing failures because of exactly this kind of issue. Maybe it would be fine to just do `/nix/var/nix/builds/jujutsu-⟨base32 of derivation hash + stuff for uniqueness⟩` and truncate the name if it’s too long or something. Or just omit it entirely and let people suffer a little if they want to look at failed builds. cc @raito @jade
Owner

we'd remove the drv name entirely and move build tmpdirs to completely hash-random build dirs with a fixed size. at 120 bits using base64 instead of hex for path length we get 20 characters total for the build tmpdir in use, which will be signficantly shorter than existing tmpdir names. if keep-failed is set we can symlink them later for easier discovery without messing with debuggability.

we'd remove the drv name entirely and move build tmpdirs to completely hash-random build dirs with a fixed size. at 120 bits using base64 instead of hex for path length we get 20 characters *total* for the build tmpdir in use, which will be signficantly shorter than existing tmpdir names. if keep-failed is set we can symlink them later for easier discovery without messing with debuggability.
Author
Member

I think symlinks would still cause problems for cd /nix/var/nix/builds/nice-name, right? But it might not matter that much. Honestly as long as --keep-failed prints the relevant directory (which would be a good UX thing in general) it might be fine for the names to be totally opaque.

I think symlinks would still cause problems for `cd /nix/var/nix/builds/nice-name`, right? But it might not matter that much. Honestly as long as `--keep-failed` prints the relevant directory (which would be a good UX thing in general) it might be fine for the names to be totally opaque.
Author
Member

FWIW I think the shortest path you could get previously was /tmp/nix-build-a.drv-0, so /nix/var/nix/builds/⟨20 characters of Base64⟩ would still be an 18 byte increase over the previous status quo. But it’d be 41 bytes shorter for Jujutsu, at least, and making the length independent of the derivation seems good.

FWIW I think the shortest path you could get previously was `/tmp/nix-build-a.drv-0`, so `/nix/var/nix/builds/⟨20 characters of Base64⟩` would still be an 18 byte increase over the previous status quo. But it’d be 41 bytes shorter for Jujutsu, at least, and making the length independent of the derivation seems good.
Owner

we probably won't even need 120 bits. 60 should be absolutely fine, which leaves us at 10 bytes extra (and let's be real, most derivations will exceed 10 bytes name length)

we probably won't even need 120 bits. 60 should be absolutely fine, which leaves us at 10 bytes extra (and let's be real, most derivations will exceed 10 bytes name length)
Owner

One possible solution to this if you do want pretty names is to have the pretty names as symlinks given to the user, perhaps?

I am totally in favour of just putting randomness in there plus or minus aggressively truncating the derivation names. For the actual builds I think it's smartest to use as short an identifier as possible because third party software is often busted, so putting a truncated name in is probably out.

I should mention: it is somewhat of a solvable problem to connect to arbitrary length sockets. Lix does it itself (it requires a fork() for each one though. it opens the socket in the parent process, chdirs into the parent dir in the child then connects). But making everyone else fix their stuff is absolutely not my goal here, let's fix this problem we caused.

One possible solution to this if you do want pretty names is to have the pretty names as symlinks given to the user, perhaps? I am totally in favour of just putting randomness in there plus or minus aggressively truncating the derivation names. For the actual builds I think it's smartest to use as short an identifier as possible because third party software is often busted, so putting a truncated name in is probably out. I should mention: it is somewhat of a solvable problem to connect to arbitrary length sockets. Lix does it itself (it requires a fork() for each one though. it opens the socket in the parent process, chdirs into the parent dir in the child then connects). But making everyone else fix their stuff is absolutely not my goal here, let's fix this problem we caused.
Owner

One possible solution to this if you do want pretty names is to have the pretty names as symlinks given to the user, perhaps?

yeah, that was our intention. if we print the symlinks to the user it'll look like nothing had changed and the attempts remain identifiable, and if a user then cd's into such an attempt all path within the attempt will still be valid. problems only occur if something expects eg realpath $PWD == $PWD, but that's busted anyway and not something we need to worry about too much

> One possible solution to this if you do want pretty names is to have the pretty names as symlinks given to the user, perhaps? yeah, that was our intention. if we print the symlinks to the user it'll look like nothing had changed and the attempts remain identifiable, and if a user then cd's into such an attempt all path within the attempt will still be valid. problems only occur if something expects eg `realpath $PWD == $PWD`, but that's busted anyway and not something we need to worry about too much
Author
Member

Won't changing to the symlink directory make things confusing precisely when a socket path relative to $PWD is too long in one case but not the other?

And symlinks in paths make things sufficiently weird that it's banned for the store directory. I feel it is better to have to deal with an opaque directory for --keep-failed than for a UX nicety to get in the way of debugging a failure.

Won't changing to the symlink directory make things confusing precisely when a socket path relative to `$PWD` is too long in one case but not the other? And symlinks in paths make things sufficiently weird that it's banned for the store directory. I feel it is better to have to deal with an opaque directory for `--keep-failed` than for a UX nicety to get in the way of debugging a failure.
jade self-assigned this 2025-07-15 04:08:44 +00:00
Member

We also hit this in our CI at Arista NDR for internal packages using the Haskell tmp-postgres package for testing.

We also hit this in our CI at Arista NDR for internal packages using the Haskell `tmp-postgres` package for testing.
Member

NIX_CONFIG='build-dir = /tmp/ab' works for me, but NIX_CONFIG='build-dir = /tmp/abc' fails

`NIX_CONFIG='build-dir = /tmp/ab'` works for me, but `NIX_CONFIG='build-dir = /tmp/abc'` fails
Owner

@bacchanalia wrote in #913 (comment):

NIX_CONFIG='build-dir = /tmp/ab' works for me, but NIX_CONFIG='build-dir = /tmp/abc' fails

Based on the public information, macOS has 108 104 chars maximum, so: everything after /tmp/ab must be 97 chars long. nix-build- is 10 chars long.

I wonder: how long is the derivation name? Do you perhaps know the shape of the Unix domain socket path?

(Ideally, software should be resilient to this and not use absolute paths, but heh.)

@bacchanalia wrote in https://git.lix.systems/lix-project/lix/issues/913#issuecomment-13320: > `NIX_CONFIG='build-dir = /tmp/ab'` works for me, but `NIX_CONFIG='build-dir = /tmp/abc'` fails Based on the public information, macOS has ~~108~~ 104 chars maximum, so: everything after `/tmp/ab` must be 97 chars long. `nix-build-` is 10 chars long. I wonder: how long is the derivation name? Do you perhaps know the shape of the Unix domain socket path? (Ideally, software should be resilient to this and not use absolute paths, but heh.)
Member

I wonder: how long is the derivation name? Do you perhaps know the shape of the Unix domain socket path?

I fished it out with fswatch:
/private/tmp/ab/nix-build-xxxxxxxxxxx-0.1.0.0.drv-0/tmp-postgres-socket-xxxxxxxxxxxxxxxx/.s.PGSQL.xxxxx

> I wonder: how long is the derivation name? Do you perhaps know the shape of the Unix domain socket path? I fished it out with `fswatch`: `/private/tmp/ab/nix-build-xxxxxxxxxxx-0.1.0.0.drv-0/tmp-postgres-socket-xxxxxxxxxxxxxxxx/.s.PGSQL.xxxxx`
Owner

🚨 haskell spotted 🚨

@rbt wrote a workaround at work that just shoved it in /tmp:

+                  , PgTmp.socketDirectory =
+                      PgTmp.Permanent path
+                  }

where path comes from:

+-- | Get a short temporary directory, preferring @/tmp@.
+--
+-- @/tmp@ is preferred to @$TMPDIR@ because the latter it can be set to a very
+-- long path; macOS sets a unique temporary directory for each shell, e.g.
+-- @/var/folders/z5/fclwwdms3r1gq4k4p3pkvvc00000gn/T/nix-shell.6RkayU@.
+getTmp :: IO FilePath
+getTmp = do
+  -- There are many possible temporary directories, but this is the shortest!
+  let tmp = "/tmp"
+  tmpExists <- doesPathExist "/tmp"
+  if tmpExists
+    then pure tmp
+    else getTemporaryDirectory
🚨 haskell spotted 🚨 @rbt wrote a workaround at work that just shoved it in /tmp: ``` + , PgTmp.socketDirectory = + PgTmp.Permanent path + } ``` where path comes from: ``` +-- | Get a short temporary directory, preferring @/tmp@. +-- +-- @/tmp@ is preferred to @$TMPDIR@ because the latter it can be set to a very +-- long path; macOS sets a unique temporary directory for each shell, e.g. +-- @/var/folders/z5/fclwwdms3r1gq4k4p3pkvvc00000gn/T/nix-shell.6RkayU@. +getTmp :: IO FilePath +getTmp = do + -- There are many possible temporary directories, but this is the shortest! + let tmp = "/tmp" + tmpExists <- doesPathExist "/tmp" + if tmpExists + then pure tmp + else getTemporaryDirectory ```
Member

Yep! After after I diagnosed the the issue I went looking at the repo for the Haskell package and it had @rbt's paws all over it, but it's been in maintainership limbo.

Yep! After after I diagnosed the the issue I went looking at the repo for the Haskell package and it had @rbt's paws all over it, but it's been in maintainership limbo.
Author
Member

We really should close off /tmp in the macOS sandbox in future, though, since it’s a pretty huge hole, causes concurrency problems, and is not what you’re meant to use on macOS anyway. So I would find it unfortunate to encourage people to use /tmp more because of this issue.

We really should close off `/tmp` in the macOS sandbox in future, though, since it’s a pretty huge hole, causes concurrency problems, and is not what you’re meant to use on macOS anyway. So I would find it unfortunate to encourage people to use `/tmp` more because of this issue.
Owner

I opened https://github.com/jfischoff/tmp-postgres/issues/290 which I believe is the proper solution for this, obviously, this doesn't prevent the workaround to be adopted and Lix to adopt a solution.

@pennae and me will not be able to get to it in a reasonable timeline, I believe, but we can help with reviews and bouncing ideas. I am in agreement with @pennae solution, and I am curious to see if the symlink idea will cause actual confusion / problems among users when debugging things that may depend on the actual build directory. I think we can throw more pedagogy and documentation in our messages to help regarding this too.

And symlinks in paths make things sufficiently weird that it's banned for the store directory. I feel it is better to have to deal with an opaque directory for --keep-failed than for a UX nicety to get in the way of debugging a failure.

Also, I'm not sure whether I understand this concern. If the symlink exists as a commodity for a user inspecting the builds, this should not get into the store directory in any way?

For the sake of my own understanding and summarization.

Solution Build directory keep-failed directory Convenient symlink with nice name Preservation of build environment UX Notes
(1) Opaque Opaque None High Low Fully consistent. Avoids symlink weirdness entirely.
(2) Opaque Renamed to non-opaque None Medium Medium Allow easier manual inspection; high risk of confusion/debug cost due to modified build directory.
(3) Opaque Opaque Provided High High Balances safety and UX; symlink introduces some complexity AND can still confuse a user.
(4) Non-opaque Non-opaque N/A High High Current behavior; very broken.

My preference is: let's try (3) then fallback to (1) if this doesn't work out. Also, we could run this via an option and let users choose and after some feedback, we can pick an informed default or even remove the possibility to choose.

I opened https://github.com/jfischoff/tmp-postgres/issues/290 which I believe is the proper solution for this, obviously, this doesn't prevent the workaround to be adopted and Lix to adopt a solution. @pennae and me will not be able to get to it in a reasonable timeline, I believe, but we can help with reviews and bouncing ideas. I am in agreement with @pennae solution, and I am curious to see if the symlink idea will cause actual confusion / problems among users when debugging things that may depend on the actual build directory. I think we can throw more pedagogy and documentation in our messages to help regarding this too. > And symlinks in paths make things sufficiently weird that it's banned for the store directory. I feel it is better to have to deal with an opaque directory for --keep-failed than for a UX nicety to get in the way of debugging a failure. Also, I'm not sure whether I understand this concern. If the symlink exists as a commodity for a user inspecting the builds, this should not get into the store directory in any way? For the sake of my own understanding and summarization. | Solution | Build directory | `keep-failed` directory | Convenient symlink with nice name | Preservation of build environment | UX | Notes | |----------|------------------|------------------------|------------------|------------------------|------------------------|-----------------------------------------------------------------------| | (1) | Opaque | Opaque | None | High | Low | Fully consistent. Avoids symlink weirdness entirely. | | (2) | Opaque | Renamed to non-opaque | None | Medium | Medium | Allow easier manual inspection; high risk of confusion/debug cost due to modified build directory. | | (3) | Opaque | Opaque | Provided | High | High | Balances safety and UX; symlink introduces some complexity *AND* can still confuse a user. | | (4) | Non-opaque | Non-opaque | N/A | High | High | Current behavior; very broken. | My preference is: let's try (3) then fallback to (1) if this doesn't work out. Also, we could run this via an option and let users choose and after some feedback, we can pick an informed default or even remove the possibility to choose.
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#913
No description provided.