UAF in nix copy #618
Labels
No labels
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/store
bug
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
RFD
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
ux
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#618
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
nix copy
crashes randomly on large copies (a couple of files copy just fine)Running with ASAN yields this trace, on
3413ab5629
(today's main) (originally crashed on nixos 24.11's lix, 2.91.1)Steps To Reproduce
Expected behavior
Shouldn't crash.
nix --version
outputreproduced on
3413ab5629
(today's main) and nixos 24.11's lix, 2.91.1Additional context
None at this time.
So the problem seems to be with the addMultipleToStore thread pool in lix/libstore/store-api.cc ..
Since it's a data race I tried building with TSan as well, attached the log with many many warnings for anyone who'd like to try to decipher this; I found it slightly easier to read but perhaps it's just that I had looked at the code in between.
The race would be between the Graph destructor (from leaving processGraph?) and
graph->rrefs[ref].insert(node);
in one of the threads.The later is done under graph_.lock(), but the destructors obviously don't take it, but it shouldn't matter anymore because my understanding from the thread pool after a quick read is that it should no longer be running after pool.process()...
OTOH TSan is clear that the main thread is running the destructor at this point, so what we need to look for is why a thread would still be running after
pool.process()
- that would obviously cause problems.Any idea?
(FWIW this also seems to affect nix, so that is not a new lix bug)