Darwin FODs have spurious mismatches on recent Lixes #904
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#904
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
A specific package wrangler produced checksum mismatch only on lix not on upstream nix
Steps To Reproduce
Expected behavior
same as nix
log
nix --version
outputnix (Lix, like Nix) 2.93.2
System type: aarch64-darwin
Additional system types:
Features: gc, signed-caches
System configuration file: /etc/nix/nix.conf
User configuration files: /Users/user/.config/nix/nix.conf:/Users/user/.nix-profile/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf
Store directory: /nix/store
State directory: /nix/var/nix
Data directory: /nix/store/cfns45lw4nm4b4yfrsy8mskw06wyy33b-lix-2.93.2/share
Additional context
originally reported at https://github.com/NixOS/nixpkgs/issues/423082
does not reproduce on linux, neither with 2.93 nor with current main. this must be a darwin-specific problem of some sort.
also doesn't reproduce on linux with case hacking enabled. since we neither have access to darwin machines we can use to test this any further nor any spoons to actually do it somefew else will have to continue from here
--keep-failed
of FODs does not retain "bad" store paths #907I can repro this problem. However, because of #907 I can't actually debug it.
are the store paths different at all with the "incorrect" hash changed?
I guess I could hack the FOD so it succeeds with the other hash so I could diff it?
There's no eval difference. It's entirely build issues.
checked again because this bugged us. the difference is in file modes set by pnpm:
this makes its way into the nars, which determine the hash:
in cppnix 2.24 the files in $out don't have +x set, in lix they do. regardless of this we've had successful builds of the deps derivation on lix 2.91, so something really funky is going on. going to assume this a pure darwin problem though and not dig any further
To be honest I'm willing to just say this is some kind of thing pnpm simply shouldn't be doing, but I don't understand how it's doing the thing it shouldn't, and it's almost certainly some ridiculous artifact of some minutiae that it should never be taking account of.
I'm not sure how we would diff the execution environment sufficiently to understand how pnpm arrived at producing the wrong result.
if cppnix 2.24 is not having this problem and lix 2.91 is not having this problem, surely the blame lies in the darwin specific code possibly?
This issue makes me want daemon managed automatic bisect tooling.
I'm going to file a ticket about that. #909
i don't think this is pnpm-exclusive, i noticed the mochi package is also bringing up a different hash on my M1 running Lix main:
If i take out Lix (by removing
lix-module
):This results in others being able to
nix run nixpkgs#mochi
(well, not really, i need to make$out/bin/mochi
point to$out/Applications/Mochi.app/Contents/MacOS/mochi
...but if that was correct then they'd be able to run it), while i get a hash mismatchgoing for a very crude "bisect" because i don't really know what else to do:
2.93.0.tar.gz
gives the correct hashmain.tar.gz
and lix to2.93.0.tar.gz
also is correct (this was just to make sure it wasn't lix-module broken and so i could focus on just only recompiling lix)2.93.0.tar.gz
and lix tomain.tar.gz
doesn't (sanity check for previous, so it's likely from lix)2.93.1.tar.gz
(from now on module is main) doesn'tat this point i got the git repo:
i'm not sure if this is 100% correct (it's 2am) but it kinda tracks with 2.93.0 working and 2.93.1 not?
as for why it happens, i'm not sure...the Nix Store APFS volume on macOS is separate from the regular data volume (and i assume the
$TMPDIR
on macOS is on the data one...perhaps there's some differences in how they work?)In order to confirm this theory, can you try to change the build-dir manually back to
$TMPDIR
or/tmp
and rebuild the offending derivation? If so, this is confirmed and this is… oh well.Yeah, if temp dir location is the real problem then I think we can just say it's a package bug and move on. Thanks so much for diagnosing this by the way!
okay so trying
<bad>
normally: wrong hash (sha256-eTdSCkc7SNJmgrzvxmiixnzlwmIeMyIB6W1O25DInzI=
)<bad>
with build-dir set to/var/folders/tg/vj6fwkdn20zfwc2nbmc0t00w0000gn/T/
(which i get by echoing$TMPDIR
) gives a permissions error:(i had to then edit
/etc/nix/nix.conf
andsudo launchctl kickstart -k system/org.nixos.nix-daemon
to get things working again)i made a dir inside it manually:
i set build-dir to that with
<bad>
and it still gives the permission error.so
chmod a+rwx /var/folders/tg/vj6fwkdn20zfwc2nbmc0t00w0000gn/T/nix
, try again...which gives the same bad hash (sha256-eTdSCkc7SNJmgrzvxmiixnzlwmIeMyIB6W1O25DInzI=
)<good>
(the commit before<bad>
) gives the right hash (sha256-5RM4eqHQoYfO5JiUH9ol+3XxOk4VX4ocE3Yia82sovI=
)<good>
with build-dir on our 777 temp dir: permission errorso, i'm not sure what
build-dir
is actually being set to...the (old) docs say it uses$TMPDIR
but i keep getting perms errors...<bad>
is4f0c59b307
<good>
is46e2bb4ca6
sigh i might be doing something wrong, because as i was trying to
diffoscope
the hash in<good>
changed...i'll go sleep and try again later, i guess take everything i've written with a grain of salt?@dibenzepin fwiw, you are using a per-user temporary directory on Darwin, you should probably just use
/tmp
for the tests, this was the previous default value.checksum differentto Darwin FODs have spurious mismatches on recent Lixesw.r.t to pnpm fetchers:
https://github.com/NixOS/nixpkgs/pull/350063
https://github.com/NixOS/nixpkgs/pull/422975
https://github.com/NixOS/nixpkgs/pull/422975#pullrequestreview-3016025813
This seems to point to a non-Lix issue.
(thanks to @emilazy for the pointers.)
ahh @raito, i thought it was
$TMPDIR
, my baddo i open a new issue and continue there? (since the pnpm one is non-lix)
@dibenzepin wrote in #904 (comment):
the .dmg is certainly the most fascinating one at the moment, so feel free to continue analyzing it here, while we deem the pnpm one probably out of scope for Lix.
okay so trying again today...using
4f0c59b307
withbuild-dir
set to/tmp
gives the correct hash ofsha256-5RM4eqHQoYfO5JiUH9ol+3XxOk4VX4ocE3Yia82sovI=
.diffoscoping the output from good and bad (with
build-dir
left as default):hm so 7z is encoding the symlink in the result instead of like, "properly" following(?) it?
(and because the runners on gh have build-dir set to
/tmp
too, it works fine for them as well?)Yep sounds like FOD instability again, the symlinks should be rewritten to be relative instead of absolute like this.