[nix#4813] nix-build exit codes are not as documented #504

Open
opened 2024-09-09 02:23:48 +00:00 by jade · 3 comments
Owner

This is: https://github.com/NixOS/nix/issues/4813

Consider the following:

builtins.derivation {
  name = "meow";
  system = "x86_64-linux";
  builder = "/bin/sh";
  args = [ "-c" "echo > $out" ];
  outputHash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=";
  outputHashAlgo = "sha256";
  outputHashMode = "flat";
}
lix/lix2 » nix-build fail.nix
this derivation will be built:
  /nix/store/2b10240ni5mvfg7pwd86yrvnkfwhqr1j-meow.drv
building '/nix/store/2b10240ni5mvfg7pwd86yrvnkfwhqr1j-meow.drv'...
error: hash mismatch in fixed-output derivation '/nix/store/2b10240ni5mvfg7pwd86yrvnkf
whqr1j-meow.drv':
        likely URL: (unknown)
         specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
            got:    sha256-AbpHGcgLb+kRsJGnwFEktk7uzpZOCcBY74+YBdrKVGs=
lix/lix2 » echo $?
1

That's busted. Quoth the manual:

Special exit codes for build failure
       1xx status codes are used when requested builds failed.  The following codes are in use:

       •  100 Generic build failure

          The builder process returned with a non-zero exit code.

       •  101 Build timeout

          The build was aborted because it did not complete within the specified timeout.

       •  102 Hash mismatch

          The build output was rejected because it does not match the outputHash attribute of the derivation.

       •  104 Not deterministic

          The build succeeded in check mode but the resulting output is not binary reproducible.

I am unsure if this is an unavoidable bug caused by the daemon architecture/protocol being busted or if it is an easily fixable bug, but it is certainly reproducible.

nix (Lix, like Nix) 2.92.0-dev-pre20240820-ac69747

This is: https://github.com/NixOS/nix/issues/4813 Consider the following: ```nix builtins.derivation { name = "meow"; system = "x86_64-linux"; builder = "/bin/sh"; args = [ "-c" "echo > $out" ]; outputHash = "sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="; outputHashAlgo = "sha256"; outputHashMode = "flat"; } ``` ``` lix/lix2 » nix-build fail.nix this derivation will be built: /nix/store/2b10240ni5mvfg7pwd86yrvnkfwhqr1j-meow.drv building '/nix/store/2b10240ni5mvfg7pwd86yrvnkfwhqr1j-meow.drv'... error: hash mismatch in fixed-output derivation '/nix/store/2b10240ni5mvfg7pwd86yrvnkf whqr1j-meow.drv': likely URL: (unknown) specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= got: sha256-AbpHGcgLb+kRsJGnwFEktk7uzpZOCcBY74+YBdrKVGs= lix/lix2 » echo $? 1 ``` That's busted. Quoth the manual: ``` Special exit codes for build failure 1xx status codes are used when requested builds failed. The following codes are in use: • 100 Generic build failure The builder process returned with a non-zero exit code. • 101 Build timeout The build was aborted because it did not complete within the specified timeout. • 102 Hash mismatch The build output was rejected because it does not match the outputHash attribute of the derivation. • 104 Not deterministic The build succeeded in check mode but the resulting output is not binary reproducible. ``` I am unsure if this is an unavoidable bug caused by the daemon architecture/protocol being busted or if it is an easily fixable bug, but it is certainly reproducible. nix (Lix, like Nix) 2.92.0-dev-pre20240820-ac69747
jade added the
bug
E/reproducible
labels 2024-09-09 02:23:48 +00:00
Author
Owner

Ah, it's permanently busted, and there is no way we can fix this without replacing the protocol. As pointed out by Théophane on the original issue, the cause is that the protocol does not send the exit status.

Since we are under a permanent legacy protocol freeze, this will never be fixed on the legacy Nix protocol. However, this can be fixed when we replace the protocol.

ef0de7c79f/src/libutil/serialise.cc (L249-L271)

Action items: update the documentation to say it's broken

Ah, it's permanently busted, and there is no way we can fix this without replacing the protocol. As pointed out by Théophane on the original issue, the cause is that the protocol does not *send* the exit status. Since we are under a permanent legacy protocol freeze, this will *never* be fixed on the legacy Nix protocol. However, this can be fixed when we replace the protocol. https://git.lix.systems/lix-project/lix/src/ef0de7c79f3b32f66db447220d26eae7e7c07b19/src/libutil/serialise.cc#L249-L271 Action items: update the documentation to say it's broken
jade added the
E/requires rearchitecture
Area/store
Area/protocol
labels 2024-09-09 02:28:31 +00:00
Author
Owner

I think this might actually not require rearchitecture, but merely using Store::buildPathsWithResults in nix-build and doing the exit code stuff entirely client side instead. I have, however, not inspected the code.

I think this might actually not require rearchitecture, but merely using `Store::buildPathsWithResults` in `nix-build` and doing the exit code stuff entirely client side instead. I have, however, not inspected the code.
jade removed the
E/requires rearchitecture
label 2024-09-09 03:03:48 +00:00
Author
Owner

Appears this at least would need a refactoring: failingExitStatus is currently calculated on the daemon side (which is absurd), and is currently calculated over an entire worker operation (which I believe implicitly does the recursion into dependent derivations necessary to represent multiple failures). b40369942c/src/libstore/build/worker.cc (L557-L575)

The way that this should be refactored is that the exit status should be calculated as a pure function of derivation result and then make the worker code use this (though obviously the fact we are calculating exit status in there to begin with is absurd. Fortunately I suspect we may be able to fix that because we could delete the old serializer and drop the 2.3 protocol, removing it from our API commitment as we plan to do anyhow. This bug has some positive effects!).

Appears this at least would need a refactoring: `failingExitStatus` is currently calculated on the daemon side (which is absurd), and is currently calculated over an entire worker operation (which I *believe* implicitly does the recursion into dependent derivations necessary to represent multiple failures). https://git.lix.systems/lix-project/lix/src/b40369942cdb3e713c473515b9760f8a0d2ed3cc/src/libstore/build/worker.cc#L557-L575 The way that this should be refactored is that the exit status should be calculated as a pure function of derivation result and then make the worker code use this (though obviously the fact we are calculating exit status in there to begin with is absurd. Fortunately I suspect we may be able to fix that *because* we could delete the old serializer and drop the 2.3 protocol, removing it from our API commitment as we plan to do anyhow. This bug has *some* positive effects!).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#504
No description provided.