fetchGit: lossily truncated git shortRev #814
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#814
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
Any resources pulled using builtins.fetchGit (and potentially others) will provide a shortRev of seven hex digits.
This is because
Hash::gitShortRev(), the corresponding method (as far as I can tell) uses a fixed length substring of 7.First of all, this mismatches with what
git rev-parseprovides (which is used for dirty repos), which defaults to using "an appropriate value is computed based on the approximate number of packed objects in your repository" (git-config(1):core.abbrev), which is probably the most common method of calculating this value (see for example lix-project/hydra#41).The more problematic property of the hardcoded value of 7 is that the abbreviations are not guaranteed to be unique anymore, which might cause issues further down the line if anyone assumes this shortRev to follow the same properties as the usual rev provided by git (git-rev-parse(1)
--short, emphasis mine):Steps To Reproduce
git clone --bare https://git.lix.systems/lix-project/lix/ lix.gitgit -C lix.git rev-parse --short dad17a54f7cf2bae24274dc2b9a535c1938e6eb0nix eval --raw --impure --expr '(builtins.fetchGit "https://git.lix.systems/lix-project/lix?ref=dad17a54f7cf2bae24274dc2b9a535c1938e6eb0").shortRev'dad17a54fanddad17a5respectively)Expected behavior
The shortRev was expected to be short, but unique.
nix --versionoutputdad17a54f7Additional context
I understand that the code I referenced which I assume to be the one that provides the shortRev has nothing to do with git specifically and is just a generic hash container which happens to be able to provide something that looks like a git shortRev.
With that said, I fully expect that this would be (or is going to) a pain to fix, and I also realize that this will cause a lot of rebuilds all over the place, break 1:1 compatibility with a lot of Nix, etc. etc..
Given that so far I am not aware of any tangible impact on anything (other than the hydra one, but that's less of a Lix issue and more one of "outside assumptions", i.e. nothing actually broke) I think it's fully reasonable to say "yeah, seven hex digits aren't guaranteed to be unique, but it works, anyone who needs uniqueness can use the unshortened rev".
Especially given that the rev-parse approach depends on which objects are present in the repository, which is not guaranteed to be deterministic, this entire issue is more along the lines of "as a user of Lix I would've expected this value to be unique", making this more of a documentation issue; the user-side assumption was the issue and documentation can supplant assumptions.
Really, I'm just a user leeching off the project (thanks for that by the way, much appreciated), you're the ones running it, so it's up to you to decide whether this is a documentation or implementation issue.
IMO this is working as intended: git rev-parse's variable length output is in fact a big problem for reproducibility of
git archivewhere it can be optionally substituted into files (yeah.... look for the bad ideas inman gitattributes). People including the "short rev" into their release tarballs causes a surprising number of fixed-output derivation reproducibility bugs in nixpkgs, from fetching source tarballs off of github/gitlab/forgejo archive services!If you want a short rev that's a different length, take the full SHA and truncate it as you feel is appropriate. I don't think that nix evaluation depending on the amount of git history in the repo is a good idea, based on my existing experience with short rev causing reproducibility issues in other areas today.
Perfectly valid. I'll consider this issue more of a documentation issue then, with shortRev being basically left out of the docs entirely (unless I am really bad at looking for it, which I'm totally considering a possibility).
Is the output of fetchGit considered stable (IOW is there the possibility of fields being potentially removed in the future)? Because if it is considered stable then including example output of an invocation with all the fields explained in its documentation and a disclaimer for the shortRev saying that it's different from git's
rev-parse --short, is not guaranteed to be unique, but is fixed length.With its output potentially causing interoperability issues, maybe a test would be good too which checks for the length specifically (and a comment a la "if this fails, please check with the docs") to make sure that a change to that makes it into release notes or such to be picked up by whatever downstream project relies on it (though the tests may also have implicit coverage of the fixed length, but I haven't looked so far).
Does that sound like a good contribution?
If so I might get around to providing a PR sometimes soon.
Yes, the output is stable in that we can't get away with changing it. It's unfortunately part of the messy situation of libfetchers and flakes. Documenting it would be good and appreciated, but I am wondering what the ideal place is, in a similar manner as fetchTree docs being kinda weird.
The realistic path to flakes stabilization includes a spec for fetchTree but the efforts by roberth and co on such a spec are currently stalled. All of this to say, it would be good, and it's just likely that wherever it goes initially might get moved later.
This issue was mentioned on Gerrit on the following CLs:
As a first time committer (especially with gerrit) I'm not 100% sure I've done everything correctly.
If you (or anyone else) could have a quick glance at lix/3116 and tell me if the form is alright (so I can remove the WIP tag), that'd be great.
In terms of content I'll take any feedback of course, this is just about whether I submitted it properly on gerrit.
Edit: with the length of the builtins page it would of course be good to avoid making it even longer, but for now I've expanded the docs where they are, if a decision is made to document all fetchers properly in a separate place it can always be moved there later (IMHO).
@benaryorg wrote in #814 (comment):
In Gerrit, don't hesitate to put reviewers, e.g. people who interacted with you in the issue. This helps us to notice even WIPs CLs in our dashboard.
Thank YOU for contributing!