lix main changes break hydra [2025-06-11] #50

Closed
opened 2025-06-11 23:51:12 +00:00 by benaryorg · 3 comments
Contributor

One of the recent changes in lix main have caused build failures in hydra due to code changes, causing:

FAILED: src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o 
clang++ -Isrc/hydra-queue-runner/hydra-queue-runner.p -Isrc/hydra-queue-runner -I../src/hydra-queue-runner -Isrc/libhydra -I../src/libhydra -I/nix/store/s715jq1cgp3vmfmchqlqqhqm33bcpld1-lix-2.94.0-dev-ee06552-dev/include -I/nix/store/rf44j4pk7xqpxxhnwhkahj6n1nbizi6g-boehm-gc-8.2.8-dev/include -I/nix/store/7dgkrhwg3g5qiij3r2b82mklm9k931pp-capnproto-1.0.2/include -I/nix/store/3hv9d38w1qyqcxyadnnyb1rzpwhqj4z4-libpqxx-7.7.5/include -I/nix/store/4ckarisv895skwiq3vxpd74qc1zyg7az-prometheus-cpp-1.1.0-dev/include -fdiagnostics-color=always -D_GLIBCXX_ASSERTIONS=1 -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++20 -O3 -pthread -include lix/config.h -MD -MQ src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o -MF src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o.d -o src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o -c ../src/hydra-queue-runner/build-remote.cc
../src/hydra-queue-runner/build-remote.cc:103:11: error: no member named 'in' in 'nix::SSH::Connection'
  103 |     child.in = AutoCloseFD{to.writeSide.release()};
      |     ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:104:11: error: no member named 'out' in 'nix::SSH::Connection'
  104 |     child.out = AutoCloseFD{from.readSide.release()};
      |     ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:108:17: error: no member named 'in' in 'nix::SSH::Connection'
  108 |     fcntl(child.in.get(), F_SETPIPE_SZ, &pipesize);
      |           ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:109:17: error: no member named 'out' in 'nix::SSH::Connection'
  109 |     fcntl(child.out.get(), F_SETPIPE_SZ, &pipesize);
      |           ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:547:27: error: no member named 'out' in 'nix::SSH::Connection'
  547 |             .from = child.out.get(),
      |                     ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:548:25: error: no member named 'in' in 'nix::SSH::Connection'
  548 |             .to = child.in.get(),
      |                   ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:554:27: error: no member named 'out' in 'nix::SSH::Connection'
  554 |             .from = child.out.get(),
      |                     ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:555:25: error: no member named 'in' in 'nix::SSH::Connection'
  555 |             .to = child.in.get(),
      |                   ~~~~~ ^
../src/hydra-queue-runner/build-remote.cc:679:15: error: no member named 'in' in 'nix::SSH::Connection'
  679 |         child.in.reset();
      |         ~~~~~ ^
9 errors generated.

I suspect cc560704de, which'd mean changing all in and out references to socket should be fine?
The reset() call there worries me a bit since maybe Hydra assumes the pipes to be independent, but then again, that would've probably messed with the SSH connection anyway.

Hydra has been in sync with Lix main for just about 2 hours ^^
(this is not a complaint, I just find it funny. I'm actually very happy that Lix people are changing a lot of things aggressively to improve code quality)

One of the [recent changes in lix main](https://hydra.cloud.bsocat.net/api/scmdiff?rev1=bea24c8d27809cafc29ed60851072150bfbfc194&rev2=ee0655240270480d7f6063dcf12ec47f04d2ded6&branch=&type=git&uri=https%3A%2F%2Fgit.shell.bsocat.net%2Flix) have caused [build failures in hydra](https://hydra.cloud.bsocat.net/build/315711/nixlog/3) due to code changes, causing: ```text FAILED: src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o clang++ -Isrc/hydra-queue-runner/hydra-queue-runner.p -Isrc/hydra-queue-runner -I../src/hydra-queue-runner -Isrc/libhydra -I../src/libhydra -I/nix/store/s715jq1cgp3vmfmchqlqqhqm33bcpld1-lix-2.94.0-dev-ee06552-dev/include -I/nix/store/rf44j4pk7xqpxxhnwhkahj6n1nbizi6g-boehm-gc-8.2.8-dev/include -I/nix/store/7dgkrhwg3g5qiij3r2b82mklm9k931pp-capnproto-1.0.2/include -I/nix/store/3hv9d38w1qyqcxyadnnyb1rzpwhqj4z4-libpqxx-7.7.5/include -I/nix/store/4ckarisv895skwiq3vxpd74qc1zyg7az-prometheus-cpp-1.1.0-dev/include -fdiagnostics-color=always -D_GLIBCXX_ASSERTIONS=1 -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c++20 -O3 -pthread -include lix/config.h -MD -MQ src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o -MF src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o.d -o src/hydra-queue-runner/hydra-queue-runner.p/build-remote.cc.o -c ../src/hydra-queue-runner/build-remote.cc ../src/hydra-queue-runner/build-remote.cc:103:11: error: no member named 'in' in 'nix::SSH::Connection' 103 | child.in = AutoCloseFD{to.writeSide.release()}; | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:104:11: error: no member named 'out' in 'nix::SSH::Connection' 104 | child.out = AutoCloseFD{from.readSide.release()}; | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:108:17: error: no member named 'in' in 'nix::SSH::Connection' 108 | fcntl(child.in.get(), F_SETPIPE_SZ, &pipesize); | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:109:17: error: no member named 'out' in 'nix::SSH::Connection' 109 | fcntl(child.out.get(), F_SETPIPE_SZ, &pipesize); | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:547:27: error: no member named 'out' in 'nix::SSH::Connection' 547 | .from = child.out.get(), | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:548:25: error: no member named 'in' in 'nix::SSH::Connection' 548 | .to = child.in.get(), | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:554:27: error: no member named 'out' in 'nix::SSH::Connection' 554 | .from = child.out.get(), | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:555:25: error: no member named 'in' in 'nix::SSH::Connection' 555 | .to = child.in.get(), | ~~~~~ ^ ../src/hydra-queue-runner/build-remote.cc:679:15: error: no member named 'in' in 'nix::SSH::Connection' 679 | child.in.reset(); | ~~~~~ ^ 9 errors generated. ``` I suspect https://git.lix.systems/lix-project/lix/commit/cc560704deb5077923b7cf9694148ef027927009, which'd mean changing all `in` and `out` references to `socket` should be fine? The `reset()` call there worries me a bit since maybe Hydra assumes the pipes to be independent, but then again, that would've probably messed with the SSH connection anyway. Hydra has been in sync with Lix main for just about 2 hours ^^ (this is not a complaint, I just find it funny. I'm actually very happy that Lix people are changing a lot of things aggressively to improve code quality)
Author
Contributor

I've had a look and it's more than just s/(in|out)/socket/g since Hydra seems to have what looks almost like a copy of the code prior to the change that introduced the incompatibility.
To be honest, it looks like a large chunk of src/hydra-queue-runner/build-remote.cc could be replaced by keeping a reference to a Lix Machine::Machine around, and then using Machine::openStore() instead of manually calling SSH, but that looks like a rather large change (however one that would get rid of a lot of boilerplate I guess?).
Alternatively the code can probably be adjusted in much the same way as the Lix side.

I'll leave this to you people who can actually deal with C++, unlike me ^^

I've had a look and it's more than just `s/(in|out)/socket/g` since Hydra seems to have what looks almost like a copy of the [code prior to the change that introduced the incompatibility](https://git.lix.systems/lix-project/lix/commit/cc560704deb5077923b7cf9694148ef027927009#diff-63143fd22966c35f0058ac85356c40c416cc3985). To be honest, it looks like a large chunk of *src/hydra-queue-runner/build-remote.cc* could be replaced by keeping a reference to a Lix `Machine::Machine` around, and then using `Machine::openStore()` instead of manually calling SSH, but that looks like a rather large change (however one that would get rid of a lot of boilerplate I guess?). Alternatively the code can probably be adjusted in much the same way as the Lix side. I'll leave this to you people who can *actually* deal with C++, unlike me ^^
Member

I'll leave this to you people who can actually deal with C++, unlike me ^^

Wait, I'm the impostor here ;)

Anyways, I may take a look at the build failure this weekend.

To be honest, it looks like a large chunk of src/hydra-queue-runner/build-remote.cc could be replaced by keeping a reference to a Lix Machine::Machine around, and then using Machine::openStore() instead of manually calling SSH, but that looks like a rather large change (however one that would get rid of a lot of boilerplate I guess?).

It is correct that a lot of Hydra code could be replaced by a bunch of Lix internals, the sole reason for the current statei sthat the implementation in the queue-runner is much older.

That being said, I"m a little bit hesitant to do this now: IIRC this was attempted quite recently by the upstream Hydra leading to a bunch of regressions.

Additionally, there's ongoing work on replacing the queue runner altogether: https://discourse.nixos.org/t/transforming-global-software-distribution-with-nixpkgs/64989

My current plan is to wait until a prototype is published to evaluate how good we could integrate this into our codebase (I hope that most of the communication happens via some IPC -- this would also mean less issues with C++ API changes).

If we come to the conclusion that this is a bad idea, I'd follow through witih the plan I made with @raito at ocean sprint, i.e.

  • asyncify the entire queue runner (right now we just block on each async call). Maybe we manage to reach a point where we don't need to do threading by hand, but can run kj with multiple threads only.
  • Replace the IPC mechanism with ssh-ng because the serve-protocol is even worse. This would give us things like #27 for free and without hacks like that.
> I'll leave this to you people who can actually deal with C++, unlike me ^^ Wait, I'm the impostor here ;) Anyways, I may take a look at the build failure this weekend. > To be honest, it looks like a large chunk of src/hydra-queue-runner/build-remote.cc could be replaced by keeping a reference to a Lix Machine::Machine around, and then using Machine::openStore() instead of manually calling SSH, but that looks like a rather large change (however one that would get rid of a lot of boilerplate I guess?). It is correct that a lot of Hydra code could be replaced by a bunch of Lix internals, the sole reason for the current statei sthat the implementation in the queue-runner is much older. That being said, I"m a little bit hesitant to do this now: IIRC this was attempted quite recently by the upstream Hydra leading to a bunch of regressions. Additionally, there's ongoing work on replacing the queue runner altogether: https://discourse.nixos.org/t/transforming-global-software-distribution-with-nixpkgs/64989 My current plan is to wait until a prototype is published to evaluate how good we could integrate this into our codebase (I hope that most of the communication happens via some IPC -- this would also mean less issues with C++ API changes). If we come to the conclusion that this is a bad idea, I'd follow through witih the plan I made with @raito at ocean sprint, i.e. * asyncify the entire queue runner (right now we just block on each async call). Maybe we manage to reach a point where we don't need to do threading by hand, but can run kj with multiple threads only. * Replace the IPC mechanism with ssh-ng because the serve-protocol is even worse. This would give us things like https://git.lix.systems/lix-project/hydra/pulls/27 for free and without hacks like that.
Author
Contributor

Sounds like a solid plan.

Turns out the reason I switched back to ssh:// from ssh-ng:// is fixed upstream now (seems I lost track of that).
I haven't been able to nail down whether the issue was fixed in Lix yet, so if moving to ssh-ng:// is planned then we should probably check whether this has been resolved on the Lix side too.

Wait, I'm the impostor here ;)

object oriented languages and template meta programming scary :neocat_scream_scared:

Sounds like a solid plan. Turns out [the reason I switched back to `ssh://` from `ssh-ng://`](https://github.com/NixOS/nix/issues/7359) is fixed upstream now (seems I lost track of that). I haven't been able to nail down whether the issue was fixed in Lix yet, so if moving to `ssh-ng://` is planned then we should probably check whether this has been resolved on the Lix side too. > Wait, I'm the impostor here ;) <sub>object oriented languages and template meta programming scary</sub> :neocat_scream_scared:
ma27 closed this issue 2025-06-19 09:20:31 +00:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/hydra#50
No description provided.