GitInput: same git shortRev length as Lix #41

Merged
ma27 merged 1 commit from benaryorg/hydra:git-hash-length into main 2025-06-11 20:33:30 +00:00
Contributor

Hello, it's me again, the person with the funny issues nobody else runs into ^^
Today we have a difference in outputs depending on whether the git input was provided by Hydra itself via the GitInput plugin (scroll down to nixos-system-hydra), or some sort of fetchGit (including fetchTree).
(or more generally, compare the version of an output of a build pulling from git directly via hydra and of the output of a build using flake.lock with fetchTree)

As best as I can tell the latter pulls its version from Hash::gitShortRev() but I'm not 100% sure (it adds up though, since most other places I found were using git rev-parse).
Whereas Hydra uses git rev-parse with the --short option which defaults to a git-internal value (which can also be provided by a config), with some additional constraints.

I fully understand that this is more of a problem on the Lix side of things since Lix truncates in a more lossy way, and quoting git-rev-parse(1) (emphasis mine):

shortens the object name to a unique prefix with at least length characters

This means that if git stores two hashes with the same prefix of that length, rev-parse will go beyond that length to ensure the prefix is unique (or so I assume, I haven't actually read the git code).
Lix on the other hand (and Hydra with this patch) will truncate the hash to the ambiguous prefix.

However in this case the GitInput plugin should IMHO mimic what Lix does verbatim to ensure the build remains reproducible to the maximum extent.
I will open an issue with Lix though, but given the structure storing the data it's probably a lot harder to get that to carry over the appropriate shortRev if that is even desired.

Either way I'm just providing this tiny lil diff, if you think it'd be a bad idea to merge it (either as-is or in general) I fully understand.
This is just me hitting all the edge-cases.… as always.

Small note: this is patch is currently in production on my infrastructure and seems to work just fine so far.
Since I had both versions already since my inputs are half pinned, and half provided by hydra, I didn't have to rebuild anything.
However given that the evaluation before the patch and the evaluation after deploying the patched hydra build the same inputs with a different output for hydra (version number losing the 2 at the end of the shortRev) I'm pretty sure this will cause fresh evals for everyone using the GitInput plugin for inputs, and potential rebuilds for expressions using that field to generate output.
(don't be misled by the input information hydra provides for the new build, it was in fact the very same commit being built, the last commit just didn't change the output, hence the evaluation was deemed unchanged and dropped)


Original commit message below this line.


Usually it doesn't matter what length the hashes are as long as they are uniform.
That means as long as Hydra outputs are compared which both use Hydra as the input, this is fine.
The same applies to outputs which use Nix' fetchGit, as flakes do.
However when comparing an eval which has its git input provided by Hydra, a different shortRev will be provided if this number is not in sync with the upstream implementation.

Given that Nix strives to be pure, this difference in input can cause integrity checks or similar to fail, and for evals using the shortRev to determine the output even the outputs will differ.
This is the case for any builds which use the shortRev as part of a version number; Hydra itself does this too!
Leading to a difference in build output depending on the input provider.
When building with input provided by Nix' fetchGit (e.g. as a flake) it will (as of writing) produce 0.1.19700101.95bb3ba
However when Hydra fetches the git repository the very same build will result in: 0.1.19700101.95bb3ba2 (note the added 2 at the end)

This commit both syncs up the numbers as well as provides a reference as to where to find this number.

Hello, it's me again, the person with the funny issues nobody else runs into ^^ Today we have a [difference in outputs depending on whether the git input was provided by Hydra itself via the GitInput plugin](https://hydra.cloud.bsocat.net/build/303061/download/1/nvd.html) (scroll down to *nixos-system-hydra*), or some sort of *fetchGit* (including *fetchTree*). (or more generally, compare the version of an [output of a build pulling from git directly via hydra](https://hydra.cloud.bsocat.net/build/302584) and of the [output of a build using *flake.lock* with *fetchTree*](https://hydra.cloud.bsocat.net/build/302709)) As best as I can tell the latter pulls its version from [`Hash::gitShortRev()`](https://git.lix.systems/lix-project/lix/src/commit/b7474fc4a97978ae2f529c74f64f37eff4d06767/lix/libutil/hash.hh#L137) but I'm not 100% sure (it adds up though, since most other places I found were using `git rev-parse`). Whereas Hydra uses `git rev-parse` with the `--short` option which defaults to a git-internal value (which can also be provided by a config), with some additional constraints. I fully understand that this is more of a problem on the Lix side of things since Lix truncates in a more lossy way, and quoting *git-rev-parse(1)* (emphasis mine): > shortens the object name to a **unique** prefix with **at least** length characters This means that if *git* stores two hashes with the same prefix of that length, *rev-parse* will go beyond that length to ensure the prefix is unique (or so I assume, I haven't actually read the git code). Lix on the other hand (and Hydra *with* this patch) will truncate the hash to the ambiguous prefix. However in this case the GitInput plugin should **IMHO** mimic what Lix does verbatim to ensure the build remains reproducible to the maximum extent. I will open an issue with Lix though, but given the structure storing the data it's probably a lot harder to get that to carry over the appropriate *shortRev* if that is even desired. Either way I'm just providing this tiny lil diff, if you think it'd be a bad idea to merge it (either as-is or in general) I fully understand. This is just me hitting all the edge-cases.… as always. Small note: this is patch is currently in production on my infrastructure and seems to work just fine so far. Since I had both versions already since my inputs are half pinned, and half provided by hydra, I didn't have to rebuild anything. However given that the [evaluation before the patch](https://hydra.cloud.bsocat.net/eval/65657#tabs-still-succeed) and the [evaluation after deploying the patched hydra](https://hydra.cloud.bsocat.net/eval/66268#tabs-still-succeed) build the same inputs with a different output for hydra (version number losing the `2` at the end of the *shortRev*) I'm pretty sure this will cause fresh evals for everyone using the GitInput plugin for inputs, and potential rebuilds for expressions using that field to generate output. (don't be misled by the input information hydra provides for the new build, it was in fact the very same commit being built, the last commit just didn't change the output, hence the evaluation was deemed unchanged and dropped) --- Original commit message below this line. --- Usually it doesn't matter what length the hashes are as long as they are uniform. That means as long as Hydra outputs are compared which both use Hydra as the input, this is fine. The same applies to outputs which use Nix' *fetchGit*, as flakes do. However when comparing an eval which has its git input provided by Hydra, a different *shortRev* will be provided if this number is not in sync with the upstream implementation. Given that Nix strives to be pure, this difference in input can cause integrity checks or similar to fail, and for evals using the *shortRev* to determine the output even the outputs will differ. This is the case for any builds which use the *shortRev* as part of a version number; Hydra itself does this too! Leading to a difference in build output depending on the input provider. When building with input provided by Nix' *fetchGit* (e.g. as a flake) it will (as of writing) produce `0.1.19700101.95bb3ba` However when Hydra fetches the git repository the very same build will result in: `0.1.19700101.95bb3ba2` (note the added 2 at the end) This commit both syncs up the numbers as well as provides a reference as to where to find this number.
Usually it doesn't matter what length the hashes are as long as they are uniform.
That means as long as Hydra outputs are compared which both use Hydra as the input, this is fine.
The same applies to outputs which use Nix' *fetchGit*, as flakes do.
However when comparing an eval which has its git input provided by Hydra, a different *shortRev* will be provided if this number is not in sync with the upstream implementation.

Given that Nix strives to be pure, this difference in input can cause integrity checks or similar to fail, and for evals using the *shortRev* to determine the output even the outputs will differ.
This is the case for any builds which use the *shortRev* as part of a version number; Hydra itself does this too!
Leading to a difference in build output depending on the input provider.
When building with input provided by Nix' *fetchGit* (e.g. as a flake) it will (as of writing) produce `0.1.19700101.95bb3ba`
However when Hydra fetches the git repository the very same build will result in: `0.1.19700101.95bb3ba2` (note the added 2 at the end)

This commit both syncs up the numbers as well as provides a reference as to where to find this number.

Signed-off-by: benaryorg <binary@benary.org>
Member

libdesignmistake^Wlibfetchers at it again...

I must say that I dislike ending up with a revision that may be ambiguous here and that we're stuck with 7 chars because of libfetchers... OTOH I fully understand the reasoning.

I... I don't know. I guess I'll revisit this later because I'm super unsure about whether or not to merge this.

libdesignmistake^Wlibfetchers at it again... I must say that I dislike ending up with a revision that may be ambiguous here and that we're stuck with 7 chars because of libfetchers... OTOH I fully understand the reasoning. I... I don't know. I guess I'll revisit this later because I'm super unsure about whether or not to merge this.
Author
Contributor

I must say that I dislike ending up with a revision that may be ambiguous here

Since the shortrev isn't ever guaranteed to be unique (given that in a decentralized version control system no party is ever ensured to have all commits), the status quo means that the same inputs can cause different outputs due to presence of new commits in the input cache.
Say we have 0123456 in the repo which is unique up to 5, and then I push 0123457 on another branch of the same repo which is pulled by hydra in a different jobset.
This causes the original input on hydra despite no changes to the actual input, to get a new digit, which triggers rebuilds, making hydra builds off of any git repositories non-reproducible as long as they use the shortrev (and even if they don't it can cause churn on hydra due to additional evals which are not necessary).

And most importantly, this PR establishes compatibility with Lix, meaning that hydra behaves as a local nix build.
Having the same input build in a different way locally than in CI is a bug IMHO.

> I must say that I dislike ending up with a revision that may be ambiguous here Since the shortrev isn't ever *guaranteed* to be unique (given that in a decentralized version control system no party is ever ensured to have all commits), the status quo means that the same inputs can cause different outputs due to presence of new commits in the input cache. Say we have `0123456` in the repo which is unique up to `5`, and then I push `0123457` on another branch of the same repo which is pulled by hydra in a different jobset. This causes the original input on hydra despite no changes to the actual input, to get a new digit, which triggers rebuilds, making hydra builds off of any git repositories non-reproducible as long as they use the shortrev (and even if they don't it can cause churn on hydra due to additional evals which are not necessary). And most importantly, this PR establishes compatibility with Lix, meaning that hydra behaves as a local `nix build`. Having the same input build in a different way locally than in CI is a bug IMHO.
Member

Since the shortrev isn't ever guaranteed to be unique (given that in a decentralized version control system no party is ever ensured to have all commits), the status quo means that the same inputs can cause different outputs due to presence of new commits in the input cache.

OK, I have to admit that I haven't thought of this and yes, that convinces me to merge (just built it locally, tests are passing).

I still think that 7 is an unfortunate choice given that Nix/Lix's primary consumer -- nixpkgs -- needed >7 chars pretty much since libfetchers exist, but 🤷

Thanks!

> Since the shortrev isn't ever guaranteed to be unique (given that in a decentralized version control system no party is ever ensured to have all commits), the status quo means that the same inputs can cause different outputs due to presence of new commits in the input cache. OK, I have to admit that I haven't thought of this and yes, that convinces me to merge (just built it locally, tests are passing). I still think that 7 is an unfortunate choice given that Nix/Lix's primary consumer -- nixpkgs -- needed >7 chars pretty much since libfetchers exist, but 🤷 Thanks!
ma27 merged commit 23670fc720 into main 2025-06-11 20:33:30 +00:00
Author
Contributor

I still think that 7 is an unfortunate choice

The idea of using shortRev that I've adopted is just to "compare" the version.
When I deploy my Hydra, I check the footer (Hydra 0.1.19700101.d40e62e (using lix-2.94.0-dev-d8b1bb5)) before and after deployment to ensure that it has changed (and the new rev matches what I wanted to deploy).
It should never be about uniqueness, just a quick visual confirmation which is unique enough.
And seven characters are long enough for most cases, but also short enough to be readable.

Just for completeness sake, the Lix manual section (for main) on fetchGit includes a bit on shortRev (which, full disclosure, I've written so.…), one part being:

For maximum reproducibility and interoperability it is recommended to not rely on this value and to truncate the returned rev to an appropriate value instead.

(btw: I'll respond in the other issues in time, no spoons currently)

> I still think that 7 is an unfortunate choice The idea of using *shortRev* that I've adopted is just to "compare" the version. When I deploy my Hydra, I check the footer (`Hydra 0.1.19700101.d40e62e (using lix-2.94.0-dev-d8b1bb5)`) before and after deployment to ensure that it has changed (and the new rev matches what I wanted to deploy). It should never be about uniqueness, just [a quick visual confirmation](https://hydra.cloud.bsocat.net/build/315654/download/1/nvd.html) which is unique *enough*. And seven characters are long enough for most cases, but also short enough to be readable. Just for completeness sake, the [Lix manual section (for *main*) on *fetchGit*](https://hydra.cloud.bsocat.net/job/infra/main/doc.lix/latest/download-by-type/doc/manual/language/builtins.html#builtins-fetchGit) includes a bit on *shortRev* (which, full disclosure, I've written so.…), one part being: > For maximum reproducibility and interoperability it is recommended to not rely on this value and to truncate the returned `rev` to an appropriate value instead. (btw: I'll respond in the other issues in time, no spoons currently)
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/hydra!41
No description provided.