Mysterious guessOrInventPath.sockets unit test failure on macOS after libexec changes #1113
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
diagnostics
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
Importance
High
Importance
Low
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
Release Blocking
Non-urgent
Release Blocking
Urgent
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
Urgency
High
Urgency
Low
ux
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#1113
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
I can't build Lix as a derivation (with or without sandboxing) due to an odd test failure that seem to only occur on my machine at the moment. The log is attached but essentially it seems to be looking into the wrong path for
unix-bind-connectwhile doing specificallyguessOrInventPath.socketsunit test.If I try to build the same revision in a development environment a seemingly related issue occurs when running the tests but in the form a timeout instead in one of the functional(2?) tests, while the unit test that was failing inside the derivation build passes fine.
The core error is
Steps To Reproduce
56988d8605or later of Lix.ProfitFailure.Expected behavior
Not this.
nix --versionoutputAdditional context
I'm going to start bisecting this more specifically to confirm which revision introduced this more specifically but probably tomorrow.
oh dear, unix sockets again 🫠 the ENOENT is likely due to the packaging not running the install phase before the check phase, so the helper isn't available yet. do you have a non-default tmpdir setting on your system? iwrc the default setting should not lead to socket paths that are long enough to require the connect helper to use
I shouldn't have a non-standard tmpdir
temp-dir =this is just based off the default install from https://lix.systems/install followed by updating the profile to the shown revisionthat is super strange then. can you add a print of the generated socket path to the guessOrInventPath sockets test in
lix/tests/unit/libutil/tests.cc? if it fails on your machine tht path must be longer than it is in ci, and figuring out where that difference comes from sounds pretty important right nowworst case we can disable that specific test on macos, or shuffle phases?
The other error I was seeing in
functional-gc-non-blockingwas also due to usingunix-bind-connectbecause becauseTMPDIRandTEMPDIRwere set based on entering thenix developenvironment which put it connecting to a test path like/private/var/folders/hn/z5j50y9x46vffv4zsbcq6vw40000gn/T/nix-shell.FLuJAy/nix-test/gc-non-blocking/var/nix/gc-socketwhich is of course too long for a direct connection.The reason the unit test passes in the dev environment because that uses a path like
/Users/lunaphied/code/lix/tests/unit/libutil/data/guess-or-invent/socketIt looks like I have
$TMPDIRgetting set to/var/folders/hn/z5j50y9x46vffv4zsbcq6vw40000gn/T/nix-shell.CKSaGAwhen I enter a shell, which makes the path just slightly too long. I cannot find so far where it's getting set butTMPDIRstarts out as/var/folders/hn/z5j50y9x46vffv4zsbcq6vw40000gn/T/or similar on my machine.@qyriad when you get a chance can you give me the equivalent information from one or both of your Darwin systems because I don't know why this isn't affecting you currently. I suspect that you might not see the error during derivation build because of having a less recent system Lix that doesn't yet use the temp directory fix.
perhaps we should just drop this unit test. the test suite quite obviously exercises that functionality already, and it relying on other binaries really makes it not a unit test anyway
I've successfully built lix on darwin-aarch64 on both release-2.95 (yielding
/nix/store/mp64v6xfaw4abyxlqa92q6vv45i97yb1-lix-2.95.2pre20260319-dev_33f713f) and current main (yielding/nix/store/y9w07015s7lmchc8mz6s19i7ry6w5ba2-lix-2.96.0-devpre20260416-dev_15c95b9) with no test failures.I can't for the life of me figure out how to get the test suite to display the print to
std::cerrI added, though, so I can't say what the socket path is.@lunaphied I had issues with the
functional-gc-non-blockingtest yesterday as well, and the fix was - and you're never gonna guess this - to restart my machine 😆@ifreilicht I believe your restart fixed it because it somehow created a shorter base tempdir. Some of the tests are also non-deterministic though I don't remember if this is a case of that.
I concur with dropping the test for now, I suspect this consistently works on Linux because
/tmpdoesn't get expanded to a much longer path in general and it's just a bomb waiting to happen.errnoback to caller (gc-socket can fail ifunix-bind-connecthelper gets used) #1184I can reliably reproduce this in Docker (image: docker.nix-community.org/nixpkgs/nix-flakes) but not outside on the native host.
we should just disable this test. it's not incredibly useful, the function itself is purely advisory, and it failing like this is just a little bit embarassing
This issue was mentioned on Gerrit on the following CLs: