Figure out what circumstances lead to nix~case~hack~1 getting into a NAR #726
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#726
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I was building a NixOS configuration on macOS at work with this command:
It reproduces with the following command from a Mac once you do have that drv file on your machine (and the package exists on cache.nixos.org):
Salient debug output:
I am wondering if
use-case-hackgot sent to the remote which then generated a garbage NAR???! I have no god damn idea.Remote is 2.93.0-dev-pre20250228-99bc686, local is 2.93.0-dev-pre20250311-a60a362.
Package contents:
Okay I have verified on remote:
does NOT contain any case hacks.
Hypothesis: somehow the case hacked filename ban applies too late?
Shorter reproducer:
Okay, the case hack is being applied locally, but why is it getting into a NAR?
Stack trace at the
debug("case collision between '%1%' and '%2%'", i->first, name);:CAUGHT IT IN THE ACT:
Bug does not repro with
--no-use-case-hackon thestore dump-pathcommand above. Need to confirm whether that is making it to the remote daemon or what is the deal with that.oh, i see what's happening. it's copyNAR, because all copies of nars must parse the nar. and if the thing doing the parsing has case hacking turned on it'll hack the stream
Okay. It is not related to settings on remote, the corruption is happening on the local machine, not the remote. I confirmed this by patching out the case-hack adding from the local
nix. I don't know what lix is doing that leads to the NAR visitor of parsing one NAR being used to construct a NAR, but it is definitely the pattern I put that filename check in to ensure it doesn't happen as defense in depth.this is purely local to your machine. the nar lister unhacks files, but the dumper does not, so if the parser hacks a nar stream copyNAR will spit it out all fucked-up-like. check out
45be1e8d66The design of the case hack setting is so fucking sketchy and definitely made it much easier for this bug to happen; we should hoist it (or maybe move the entire case hacking into the FS visitor). I can work on a CL to move the case hacking code to the FS side to hopefully make this much less likely in the future.
See also: #633
This issue was mentioned on Gerrit on the following CLs: