Default build directory change worsens Unix socket path length limit on Darwin #913
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#913
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Darwin has a small maximum path length for Unix sockets and
/nix/var/nix/builds
is longer than/tmp
. (Specifically it is 15 longer.)The limit is not much bigger on Linux I think, but filesystem namespacing hides the difference there.
Empirically, this has resulted in two people independently reporting failing builds as a result in the macOS room, and Jujutsu’s CI starting to fail after they picked up the Nix backport of the CVE fixes. So although there is obviously no general‐purpose solution for an arbitrary limit,
/tmp
was significantly more reliable at not running into the limit.My proposal is to regain the previous margin by removing the now‐redundant
nix-build-
prefix from the directory names (10 bytes) and the I‐think‐redundant.drv
(4 bytes) and, uh, maybe we can find another byte hiding under the sofa somewhere. The numbers after the derivation name seem like they could probably be shorter.Note that the previous status quo was not perfect; Jujutsu had to shorten some paths in tests when making their derivation name longer: https://github.com/jj-vcs/jj/pull/6499.
/nix/var/nix/builds/nix-build-jujutsu-0.31.0-unstable-1d98834.drv-4695-2529470193
is a pretty long prefix. I’m open to solutions that take the out of the equation here, but I didn’t like the idea of renaming after--keep-failed
because it could interfere with reproducing failures because of exactly this kind of issue. Maybe it would be fine to just do/nix/var/nix/builds/jujutsu-⟨base32 of derivation hash + stuff for uniqueness⟩
and truncate the name if it’s too long or something. Or just omit it entirely and let people suffer a little if they want to look at failed builds.cc @raito @jade
we'd remove the drv name entirely and move build tmpdirs to completely hash-random build dirs with a fixed size. at 120 bits using base64 instead of hex for path length we get 20 characters total for the build tmpdir in use, which will be signficantly shorter than existing tmpdir names. if keep-failed is set we can symlink them later for easier discovery without messing with debuggability.
I think symlinks would still cause problems for
cd /nix/var/nix/builds/nice-name
, right? But it might not matter that much. Honestly as long as--keep-failed
prints the relevant directory (which would be a good UX thing in general) it might be fine for the names to be totally opaque.FWIW I think the shortest path you could get previously was
/tmp/nix-build-a.drv-0
, so/nix/var/nix/builds/⟨20 characters of Base64⟩
would still be an 18 byte increase over the previous status quo. But it’d be 41 bytes shorter for Jujutsu, at least, and making the length independent of the derivation seems good.we probably won't even need 120 bits. 60 should be absolutely fine, which leaves us at 10 bytes extra (and let's be real, most derivations will exceed 10 bytes name length)
One possible solution to this if you do want pretty names is to have the pretty names as symlinks given to the user, perhaps?
I am totally in favour of just putting randomness in there plus or minus aggressively truncating the derivation names. For the actual builds I think it's smartest to use as short an identifier as possible because third party software is often busted, so putting a truncated name in is probably out.
I should mention: it is somewhat of a solvable problem to connect to arbitrary length sockets. Lix does it itself (it requires a fork() for each one though. it opens the socket in the parent process, chdirs into the parent dir in the child then connects). But making everyone else fix their stuff is absolutely not my goal here, let's fix this problem we caused.
yeah, that was our intention. if we print the symlinks to the user it'll look like nothing had changed and the attempts remain identifiable, and if a user then cd's into such an attempt all path within the attempt will still be valid. problems only occur if something expects eg
realpath $PWD == $PWD
, but that's busted anyway and not something we need to worry about too muchWon't changing to the symlink directory make things confusing precisely when a socket path relative to
$PWD
is too long in one case but not the other?And symlinks in paths make things sufficiently weird that it's banned for the store directory. I feel it is better to have to deal with an opaque directory for
--keep-failed
than for a UX nicety to get in the way of debugging a failure.We also hit this in our CI at Arista NDR for internal packages using the Haskell
tmp-postgres
package for testing.NIX_CONFIG='build-dir = /tmp/ab'
works for me, butNIX_CONFIG='build-dir = /tmp/abc'
fails@bacchanalia wrote in #913 (comment):
Based on the public information, macOS has
108104 chars maximum, so: everything after/tmp/ab
must be 97 chars long.nix-build-
is 10 chars long.I wonder: how long is the derivation name? Do you perhaps know the shape of the Unix domain socket path?
(Ideally, software should be resilient to this and not use absolute paths, but heh.)
I fished it out with
fswatch
:/private/tmp/ab/nix-build-xxxxxxxxxxx-0.1.0.0.drv-0/tmp-postgres-socket-xxxxxxxxxxxxxxxx/.s.PGSQL.xxxxx
🚨 haskell spotted 🚨
@rbt wrote a workaround at work that just shoved it in /tmp:
where path comes from:
Yep! After after I diagnosed the the issue I went looking at the repo for the Haskell package and it had @rbt's paws all over it, but it's been in maintainership limbo.
We really should close off
/tmp
in the macOS sandbox in future, though, since it’s a pretty huge hole, causes concurrency problems, and is not what you’re meant to use on macOS anyway. So I would find it unfortunate to encourage people to use/tmp
more because of this issue.I opened https://github.com/jfischoff/tmp-postgres/issues/290 which I believe is the proper solution for this, obviously, this doesn't prevent the workaround to be adopted and Lix to adopt a solution.
@pennae and me will not be able to get to it in a reasonable timeline, I believe, but we can help with reviews and bouncing ideas. I am in agreement with @pennae solution, and I am curious to see if the symlink idea will cause actual confusion / problems among users when debugging things that may depend on the actual build directory. I think we can throw more pedagogy and documentation in our messages to help regarding this too.
Also, I'm not sure whether I understand this concern. If the symlink exists as a commodity for a user inspecting the builds, this should not get into the store directory in any way?
For the sake of my own understanding and summarization.
keep-failed
directoryMy preference is: let's try (3) then fallback to (1) if this doesn't work out. Also, we could run this via an option and let users choose and after some feedback, we can pick an informed default or even remove the possibility to choose.