fetchGit: lossily truncated git shortRev #814

Closed
opened 2025-04-29 00:16:43 +00:00 by benaryorg · 6 comments

Describe the bug

Any resources pulled using builtins.fetchGit (and potentially others) will provide a shortRev of seven hex digits.
This is because Hash::gitShortRev(), the corresponding method (as far as I can tell) uses a fixed length substring of 7.

First of all, this mismatches with what git rev-parse provides (which is used for dirty repos), which defaults to using "an appropriate value is computed based on the approximate number of packed objects in your repository" (git-config(1): core.abbrev), which is probably the most common method of calculating this value (see for example lix-project/hydra#41).

The more problematic property of the hardcoded value of 7 is that the abbreviations are not guaranteed to be unique anymore, which might cause issues further down the line if anyone assumes this shortRev to follow the same properties as the usual rev provided by git (git-rev-parse(1) --short, emphasis mine):

shortens the object name to a unique prefix with at least length characters

Steps To Reproduce

  1. git clone --bare https://git.lix.systems/lix-project/lix/ lix.git
  2. git -C lix.git rev-parse --short dad17a54f7cf2bae24274dc2b9a535c1938e6eb0
  3. nix eval --raw --impure --expr '(builtins.fetchGit "https://git.lix.systems/lix-project/lix?ref=dad17a54f7cf2bae24274dc2b9a535c1938e6eb0").shortRev'
  4. notice the difference in output (for me dad17a54f and dad17a5 respectively)

Expected behavior

The shortRev was expected to be short, but unique.

nix --version output

dad17a54f7

nix (Lix, like Nix) 2.93.0-dev
System type: x86_64-linux
Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux
Features: gc, signed-caches
System configuration file: /etc/nix/nix.conf
User configuration files: /home/benaryorg/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/benaryorg/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/benaryorg/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/benaryorg/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf
Store directory: /nix/store
State directory: /nix/var/nix
Data directory: /nix/store/vkgmqv2d0z172sr8dfxamasgdxjvxxkx-lix-2.93.0-dev/share

Additional context

I understand that the code I referenced which I assume to be the one that provides the shortRev has nothing to do with git specifically and is just a generic hash container which happens to be able to provide something that looks like a git shortRev.
With that said, I fully expect that this would be (or is going to) a pain to fix, and I also realize that this will cause a lot of rebuilds all over the place, break 1:1 compatibility with a lot of Nix, etc. etc..
Given that so far I am not aware of any tangible impact on anything (other than the hydra one, but that's less of a Lix issue and more one of "outside assumptions", i.e. nothing actually broke) I think it's fully reasonable to say "yeah, seven hex digits aren't guaranteed to be unique, but it works, anyone who needs uniqueness can use the unshortened rev".
Especially given that the rev-parse approach depends on which objects are present in the repository, which is not guaranteed to be deterministic, this entire issue is more along the lines of "as a user of Lix I would've expected this value to be unique", making this more of a documentation issue; the user-side assumption was the issue and documentation can supplant assumptions.
Really, I'm just a user leeching off the project (thanks for that by the way, much appreciated), you're the ones running it, so it's up to you to decide whether this is a documentation or implementation issue.

## Describe the bug Any resources pulled using *builtins.fetchGit* (and potentially others) will provide a *shortRev* of seven hex digits. This is because [`Hash::gitShortRev()`](https://git.lix.systems/lix-project/lix/src/commit/b7474fc4a97978ae2f529c74f64f37eff4d06767/lix/libutil/hash.hh#L137), the corresponding method (**as far as I can tell**) uses a fixed length substring of 7. First of all, this mismatches with what `git rev-parse` provides ([which *is* used for dirty repos](https://git.lix.systems/lix-project/lix/src/commit/801567adf0ccf41114e24a6b4f182c0da1ccf503/lix/libfetchers/git.cc#L259-L260)), which defaults to using "an appropriate value is computed based on the approximate number of packed objects in your repository" (*git-config(1)*: `core.abbrev`), which is probably the most common method of calculating this value (see for example lix-project/hydra#41). The more problematic property of the hardcoded value of 7 is that the abbreviations are not guaranteed to be unique anymore, which might cause issues further down the line if anyone assumes this *shortRev* to follow the same properties as the usual rev provided by git (*git-rev-parse(1)* `--short`, emphasis mine): > shortens the object name to a **unique** prefix with **at least** length characters ## Steps To Reproduce 1. `git clone --bare https://git.lix.systems/lix-project/lix/ lix.git` 2. `git -C lix.git rev-parse --short dad17a54f7cf2bae24274dc2b9a535c1938e6eb0` 3. `nix eval --raw --impure --expr '(builtins.fetchGit "https://git.lix.systems/lix-project/lix?ref=dad17a54f7cf2bae24274dc2b9a535c1938e6eb0").shortRev'` 4. notice the difference in output (for me `dad17a54f` and `dad17a5` respectively) ## Expected behavior The *shortRev* was expected to be short, but unique. ## `nix --version` output dad17a54f7cf2bae24274dc2b9a535c1938e6eb0 ```text nix (Lix, like Nix) 2.93.0-dev System type: x86_64-linux Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux Features: gc, signed-caches System configuration file: /etc/nix/nix.conf User configuration files: /home/benaryorg/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/benaryorg/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/benaryorg/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/benaryorg/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf Store directory: /nix/store State directory: /nix/var/nix Data directory: /nix/store/vkgmqv2d0z172sr8dfxamasgdxjvxxkx-lix-2.93.0-dev/share ``` ## Additional context I understand that the code I referenced which I assume to be the one that provides the *shortRev* has nothing to do with git *specifically* and is just a generic hash container which happens to be able to provide something that looks like a git *shortRev*. With that said, I fully expect that this would be (or is going to) a pain to fix, and I also realize that this will cause a lot of rebuilds all over the place, break 1:1 compatibility with a lot of Nix, etc. etc.. Given that so far I am not aware of any tangible impact on anything (other than the hydra one, but that's less of a Lix issue and more one of "outside assumptions", i.e. nothing actually *broke*) I think it's fully reasonable to say "yeah, seven hex digits aren't guaranteed to be unique, but it works, anyone who needs uniqueness can use the unshortened *rev*". Especially given that the *rev-parse* approach depends on which objects are present in the repository, which is not guaranteed to be deterministic, this entire issue is more along the lines of "as a user of Lix I would've expected this value to be unique", making this more of a documentation issue; the user-side assumption was the issue and documentation can supplant assumptions. Really, I'm just a user leeching off the project (thanks for that by the way, much appreciated), you're the ones running it, so it's up to you to decide whether this is a documentation or implementation issue.
Owner

First of all, this mismatches with what git rev-parse provides (which is used for dirty repos), which defaults to using "an appropriate value is computed based on the approximate number of packed objects in your repository" (git-config(1): core.abbrev), which is probably the most common method of calculating this value (see for example lix-project/hydra#41).

IMO this is working as intended: git rev-parse's variable length output is in fact a big problem for reproducibility of git archive where it can be optionally substituted into files (yeah.... look for the bad ideas in man gitattributes). People including the "short rev" into their release tarballs causes a surprising number of fixed-output derivation reproducibility bugs in nixpkgs, from fetching source tarballs off of github/gitlab/forgejo archive services!

If you want a short rev that's a different length, take the full SHA and truncate it as you feel is appropriate. I don't think that nix evaluation depending on the amount of git history in the repo is a good idea, based on my existing experience with short rev causing reproducibility issues in other areas today.

> First of all, this mismatches with what git rev-parse provides (which is used for dirty repos), which defaults to using "an appropriate value is computed based on the approximate number of packed objects in your repository" (git-config(1): core.abbrev), which is probably the most common method of calculating this value (see for example lix-project/hydra#41). IMO this is working as intended: git rev-parse's variable length output is in fact a big problem for reproducibility of `git archive` where it can be optionally substituted into files (yeah.... look for the bad ideas in `man gitattributes`). People including the "short rev" into their release tarballs causes a surprising number of fixed-output derivation reproducibility bugs in nixpkgs, from fetching source tarballs off of github/gitlab/forgejo archive services! If you want a short rev that's a different length, take the full SHA and truncate it as you feel is appropriate. I don't think that nix evaluation *depending on the amount of git history in the repo* is a good idea, based on my existing experience with short rev causing reproducibility issues in other areas today.
Author

IMO this is working as intended: git rev-parse's variable length output is in fact a big problem for reproducibility

Perfectly valid. I'll consider this issue more of a documentation issue then, with shortRev being basically left out of the docs entirely (unless I am really bad at looking for it, which I'm totally considering a possibility).

Is the output of fetchGit considered stable (IOW is there the possibility of fields being potentially removed in the future)? Because if it is considered stable then including example output of an invocation with all the fields explained in its documentation and a disclaimer for the shortRev saying that it's different from git's rev-parse --short, is not guaranteed to be unique, but is fixed length.
With its output potentially causing interoperability issues, maybe a test would be good too which checks for the length specifically (and a comment a la "if this fails, please check with the docs") to make sure that a change to that makes it into release notes or such to be picked up by whatever downstream project relies on it (though the tests may also have implicit coverage of the fixed length, but I haven't looked so far).

Does that sound like a good contribution?
If so I might get around to providing a PR sometimes soon.

> IMO this is working as intended: git rev-parse's variable length output is in fact a big problem for reproducibility Perfectly valid. I'll consider this issue more of a documentation issue then, with *shortRev* being basically left out of the docs entirely (unless I am really bad at looking for it, which I'm totally considering a possibility). Is the output of *fetchGit* considered stable (IOW is there the possibility of fields being potentially removed in the future)? Because if it is considered stable then including example output of an invocation with all the fields explained in [its documentation](https://docs.lix.systems/manual/lix/nightly/language/builtins.html?highlight=fetchers#builtins-fetchGit) and a disclaimer for the *shortRev* saying that it's different from git's `rev-parse --short`, is not guaranteed to be unique, but is fixed length. With its output *potentially* causing interoperability issues, maybe a test would be good too which checks for the length specifically (and a comment a la "if this fails, please check with the docs") to make sure that a change to that makes it into release notes or such to be picked up by whatever downstream project relies on it (though the tests may also have implicit coverage of the fixed length, but I haven't looked so far). Does that sound like a good contribution? If so I *might* get around to providing a PR sometimes soon.
Owner

Yes, the output is stable in that we can't get away with changing it. It's unfortunately part of the messy situation of libfetchers and flakes. Documenting it would be good and appreciated, but I am wondering what the ideal place is, in a similar manner as fetchTree docs being kinda weird.

The realistic path to flakes stabilization includes a spec for fetchTree but the efforts by roberth and co on such a spec are currently stalled. All of this to say, it would be good, and it's just likely that wherever it goes initially might get moved later.

Yes, the output is stable in that we can't get away with changing it. It's unfortunately part of the messy situation of libfetchers and flakes. Documenting it would be good and appreciated, but I am wondering what the ideal place is, in a similar manner as fetchTree docs being kinda weird. The realistic path to flakes stabilization includes a spec for fetchTree but the efforts by roberth and co on such a spec are currently stalled. All of this to say, it would be good, and it's just likely that wherever it goes initially might get moved later.
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/3116 ("libexpr: fetchGit output documentation")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/3116", "number": 3116, "kind": "commit message"}], "cl_meta": {"3116": {"change_title": "libexpr: fetchGit output documentation"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/3116](https://gerrit.lix.systems/c/lix/+/3116) ("libexpr: fetchGit output documentation")
Author

As a first time committer (especially with gerrit) I'm not 100% sure I've done everything correctly.
If you (or anyone else) could have a quick glance at lix/3116 and tell me if the form is alright (so I can remove the WIP tag), that'd be great.
In terms of content I'll take any feedback of course, this is just about whether I submitted it properly on gerrit.

Edit: with the length of the builtins page it would of course be good to avoid making it even longer, but for now I've expanded the docs where they are, if a decision is made to document all fetchers properly in a separate place it can always be moved there later (IMHO).

As a first time committer (especially with gerrit) I'm not 100% sure I've done everything correctly. If you (or anyone else) could have a quick glance at [lix/3116](https://gerrit.lix.systems/c/lix/+/3116) and tell me if the *form* is alright (so I can remove the WIP tag), that'd be great. In terms of *content* I'll take any feedback of course, this is just about whether I submitted it properly on gerrit. *Edit*: with the length of the builtins page it would of course be good to avoid making it even longer, but for now I've expanded the docs where they are, if a decision is made to document all fetchers properly in a separate place it can always be moved there later (IMHO).
Owner

@benaryorg wrote in #814 (comment):

As a first time committer (especially with gerrit) I'm not 100% sure I've done everything correctly. If you (or anyone else) could have a quick glance at lix/3116 and tell me if the form is alright (so I can remove the WIP tag), that'd be great. In terms of content I'll take any feedback of course, this is just about whether I submitted it properly on gerrit.

Edit: with the length of the builtins page it would of course be good to avoid making it even longer, but for now I've expanded the docs where they are, if a decision is made to document all fetchers properly in a separate place it can always be moved there later (IMHO).

In Gerrit, don't hesitate to put reviewers, e.g. people who interacted with you in the issue. This helps us to notice even WIPs CLs in our dashboard.

Thank YOU for contributing!

@benaryorg wrote in https://git.lix.systems/lix-project/lix/issues/814#issuecomment-12086: > As a first time committer (especially with gerrit) I'm not 100% sure I've done everything correctly. If you (or anyone else) could have a quick glance at [lix/3116](https://gerrit.lix.systems/c/lix/+/3116) and tell me if the _form_ is alright (so I can remove the WIP tag), that'd be great. In terms of _content_ I'll take any feedback of course, this is just about whether I submitted it properly on gerrit. > > _Edit_: with the length of the builtins page it would of course be good to avoid making it even longer, but for now I've expanded the docs where they are, if a decision is made to document all fetchers properly in a separate place it can always be moved there later (IMHO). In Gerrit, don't hesitate to put reviewers, e.g. people who interacted with you in the issue. This helps us to notice even WIPs CLs in our dashboard. Thank *YOU* for contributing!
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#814
No description provided.