RFD: remove build-hook setting #911

Open
opened 2025-07-10 18:14:41 +00:00 by pennae · 7 comments
Owner

build-hook and its associated nix __build-remote are extremely cursed. each remote build uses its own build-hook process, which holds on to a single local store connection and another single remote store connection (which itself usually holds on to a single ssh process). scheduling of remote builders is complicated immensely by this because it all happens through file locks, the additional localstore connection causes overhead, and the behavior differences of ssh and ssh-ng remotes are a major headache.

we should probably just remove build-hook altogether. an rpc-based system will not have any use for build hooks as they exist today, and what little can be gained from build hooks can also be got from plugins that add custom store url schemes. we haven't been able to find any use of build-hook out there either.

this is not about removing pre-build-hook or post-build-hook, just build-hook. we've already deprecated build-hook a while ago.

`build-hook` and its associated `nix __build-remote` are extremely cursed. each remote build uses its own build-hook process, which holds on to a single local store connection and another single remote store connection (which itself usually holds on to a single ssh process). scheduling of remote builders is complicated immensely by this because it all happens through file locks, the additional localstore connection causes overhead, and the behavior differences of ssh and ssh-ng remotes are a major headache. we should probably just remove `build-hook` altogether. an rpc-based system will not have any use for build hooks as they exist today, and what little can be gained from build hooks can also be got from plugins that add custom store url schemes. we haven't been able to find *any* use of `build-hook` out there either. this is *not* about removing `pre-build-hook` or `post-build-hook`, just `build-hook`. we've already deprecated `build-hook` a while ago.
Owner

should we get this done in 2.95.0? if so, please add release blocker to it

should we get this done in 2.95.0? if so, please add release blocker to it
Author
Owner

not entirely sure yet. we'll add it as a blocker just so we don't forget about it, if 2.95 ends up taking too long we can also postpone this a bit

not entirely sure yet. we'll add it as a blocker just so we don't forget about it, if 2.95 ends up taking too long we can also postpone this a bit
pennae added this to the 2.97 milestone 2025-12-01 14:51:11 +00:00
pennae modified the milestone from 2.97 to 2.95 2025-12-01 14:52:00 +00:00
pennae removed this from the 2.95 milestone 2026-02-09 14:21:43 +00:00
Member

This issue was mentioned on Gerrit on the following CLs:

  • comment in cl/5380 ("Kill build hook with SIGTERM instead of SIGKILL")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/5380", "number": 5380, "kind": "comment"}], "cl_meta": {"5380": {"change_title": "Kill build hook with SIGTERM instead of SIGKILL"}}} --> This issue was mentioned on Gerrit on the following CLs: * comment in [cl/5380](https://gerrit.lix.systems/c/lix/+/5380) ("Kill build hook with SIGTERM instead of SIGKILL")

I do want to note that custom store url schemes are not really sufficient to replace build hooks, because that doesn't end up copying the result back to the local store, resulting in a state tracking problem where the custom store needs to somehow keep track of which store paths are on which remote machine in between your nix build --store custom:// and nix copy --from custom:// invocations. Acting like a NIX_REMOTE is better for this, which will get easier to do after the Nix GSoC project to make the daemon connection handler pluggable by libnix-store users (but it's still not a great replacement).

I do want to note that custom store url schemes are not really sufficient to replace build hooks, because that doesn't end up copying the result back to the local store, resulting in a state tracking problem where the custom store needs to somehow keep track of which store paths are on which remote machine in between your `nix build --store custom://` and `nix copy --from custom://` invocations. Acting like a `NIX_REMOTE` is better for this, which will get easier to do after the Nix GSoC project to make the daemon connection handler pluggable by libnix-store users (but it's still not a great replacement).
Author
Owner

when we remove the hook it will be the daemon's job to copy outputs from builder stores to the local store once a build has completed as part of finalization of a remote build (which currently just amounts to "wait to the build hook to exit"). our goal is to have the daemon open rpc connections to other daemons directly; in that model if you want something that behaves like a custom build hook does today you'd set up a local proxy that does all your custom processing and forwards to other stores as needed (including the final copy process). rather than make the connection handler pluggable we'll make the daemon itself trivial to proxy.

when we remove the hook it will be the daemon's job to copy outputs from builder stores to the local store once a build has completed as part of finalization of a remote build (which currently just amounts to "wait to the build hook to exit"). our goal is to have the daemon open rpc connections to other daemons *directly*; in that model if you want something that behaves like a custom build hook does today you'd set up a local proxy that does all your custom processing and forwards to other stores as needed (including the final copy process). rather than make the connection handler pluggable we'll make the daemon itself trivial to proxy.

Is there an issue (aside from this one) where I can track the daemon proxying design / implementation?

Is there an issue (aside from this one) where I can track the daemon proxying design / implementation?
Author
Owner

not yet, we're still preparing to actually start the work on this. initial cls should start showing up in the coming weeks, from there it'll likely be a lot of experimentation under new xp features until we find something that works well and isn't likely to bite us in the tail any time soon

not yet, we're still preparing to actually start the work on this. initial cls should start showing up in the coming weeks, from there it'll likely be a lot of experimentation under new xp features until we find something that works well and isn't likely to bite us in the tail any time soon
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#911
No description provided.