RFD: Remove ca-derivations #815

Open
opened 2025-04-29 16:17:47 +00:00 by raito · 26 comments
Owner

We plan to remove the ca-derivations experimental feature and all associated code from Lix. Content-addressed derivations compute the output path based on the hash of the output itself, rather than a hash of the inputs. The goal was to enable early cut-off in build systems and reduce the impact of mass rebuilds due to trivial changes (e.g., modifying a comment in a C file that doesn't affect the result).

While this feature aimed to improve binary cache efficiency and promote bit-for-bit reproducibility, it has seen no significant adoption. No known production workflows depend on it. Although there was a push in May 2021 to explore its future (Discourse thread), the effort stalled. I participated in that beta test but eventually removed the feature from all my systems due to maintenance issues and frequent breakages.

We’re removing ca-derivations for reasons similar to the removal of recursive-nix:

  • It tightly couples derivation hash logic with assumptions about output paths, hindering cleanup and evolution of the derivation format.

  • Its design includes inconsistencies and edge cases, making it fragile in the face of minor system changes.

  • There is no current plan in Lix to stabilize it in its existing form. That said, we’re open to revisiting the concept in a different shape—for example, by drawing inspiration from Snix’s work on the storage layer (snix.dev).

  • Its presence blocks simplifications in the store, evaluator, and derivation codebase.

We are aware that the original C++ Nix implementation is working to bring ca-derivations to a trial-ready state on production Hydra. If that effort results in thorough and successful real-world testing, we would be happy to re-evaluate the readiness of ca-derivations and consider reintroducing it in Lix in some form.

If you’re using ca-derivations today, please let us know how and why. Understanding your use case could help inform a future design—but as it stands, the current implementation is not sustainable.

If everything is OK with this plan, the removal is slated for 2.94.0.

cc @jade @qyriad @lunaphied @pennae @alois31 @k900 @piegames

We plan to remove the [`ca-derivations` experimental feature](https://docs.lix.systems/manual/lix/stable/contributing/experimental-features.html?highlight=ca-derivat#xp-feature-ca-derivations) and all associated code from Lix. Content-addressed derivations compute the output path based on the hash of the output itself, rather than a hash of the inputs. The goal was to enable early cut-off in build systems and reduce the impact of mass rebuilds due to trivial changes (e.g., modifying a comment in a C file that doesn't affect the result). While this feature aimed to improve binary cache efficiency and promote bit-for-bit reproducibility, it has seen no significant adoption. No known production workflows depend on it. Although there was a push in May 2021 to explore its future ([Discourse thread](https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881)), the effort stalled. I participated in that beta test but eventually removed the feature from all my systems due to maintenance issues and frequent breakages. We’re removing `ca-derivations` for reasons similar to the removal of recursive-nix: - It tightly couples derivation hash logic with assumptions about output paths, hindering cleanup and evolution of the derivation format. - Its design includes inconsistencies and edge cases, making it fragile in the face of minor system changes. - There is no current plan in Lix to stabilize it in its existing form. That said, we’re open to revisiting the concept in a different shape—for example, by drawing inspiration from Snix’s work on the storage layer (snix.dev). - Its presence blocks simplifications in the store, evaluator, and derivation codebase. We are aware that the original C++ Nix implementation is working to bring `ca-derivations` to a trial-ready state on production Hydra. If that effort results in thorough and successful real-world testing, we would be happy to re-evaluate the readiness of ca-derivations and consider reintroducing it in Lix in some form. If you’re using ca-derivations today, please let us know how and why. Understanding your use case could help inform a future design—but as it stands, the current implementation is not sustainable. If everything is OK with this plan, the removal is slated for 2.94.0. cc @jade @qyriad @lunaphied @pennae @alois31 @k900 @piegames
Owner

let's do it

let's do it
Member

Having both used ca-derivations and then given up on them because of brokenness, and having looked at some of their libstore parts, I am certainly not against removing them. However I do think that they see widespread enough usage that they should be visibly deprecated (either via a warning or deprecated-features) for at least one release before the actual removal happens (if the deprecation is included in 2.93, the 2.94 timeline for removal can still work out).

Having both used ca-derivations and then given up on them because of brokenness, and having looked at some of their libstore parts, I am certainly not against removing them. However I do think that they see widespread enough usage that they should be visibly deprecated (either via a warning or deprecated-features) for at least one release before the actual removal happens (if the deprecation is included in 2.93, the 2.94 timeline for removal can still work out).
Member

Do we have any good way of quantifying in the wild usage?

Do we have any good way of quantifying in the wild usage?
Author
Owner

@piegames wrote in #815 (comment):

Do we have any good way of quantifying in the wild usage?

This is the closest way we have: https://sourcegraph.com/search?q=context:global+%22ca-derivations%22&patternType=keyword&sm=0.

@piegames wrote in https://git.lix.systems/lix-project/lix/issues/815#issuecomment-10472: > Do we have any good way of quantifying in the wild usage? This is the closest way we have: https://sourcegraph.com/search?q=context:global+%22ca-derivations%22&patternType=keyword&sm=0.
Member

I've used ca-derivations for years, saw getting no actual benefit, and saw that the feature hard stalled in upstream. So I think current ca-derivations are mostly dead anyway, so I support the removal.

I've used ca-derivations for years, saw getting no actual benefit, and saw that the feature hard stalled in upstream. So I think current ca-derivations are mostly dead anyway, so I support the removal.
Owner

Ack on removing it. I've looked at the feature somewhat and my perspective is roughly similar to my view on dynamic-derivations: not baked enough to exist, no driving force or energy behind it yet either.

Ack on removing it. I've looked at the feature somewhat and my perspective is roughly similar to my view on dynamic-derivations: not baked enough to exist, no driving force or energy behind it yet either.
Owner

we do have some ideas on how to make ca derivations work without being as hugely and unmanageably intrusive as they are now, but a lot of other things must happen first for that to be a viable thing to even start. as it stands we see no way to fix ca drvs without changing them in extremely incompatible ways, and removing them soon will definitely make preparations a lot easier.

2.93 should at the very least contain a deprecation notice though

we do have some ideas on how to make ca derivations work without being as hugely and unmanageably intrusive as they are now, but a lot of other things must happen first for that to be a viable thing to even start. as it stands we see no way to fix ca drvs without changing them in extremely incompatible ways, and removing them soon will definitely make preparations a lot easier. 2.93 should at the very least contain a deprecation notice though
Member

One more thing to keep in mind is that both ca-derivations themselves and the dynamic-derivations that depend on them leave a footprint in the store. For this reason enough support code should be kept around for a bit longer to not break users' systems, allow them to GC away old ca-derivations and dynamic-derivations, and potentially undo the schema migration when all ca-derivations are gone.

One more thing to keep in mind is that both ca-derivations themselves and the dynamic-derivations that depend on them leave a footprint in the store. For this reason enough support code should be kept around for a bit longer to not break users' systems, allow them to GC away old ca-derivations and dynamic-derivations, and potentially undo the schema migration when all ca-derivations are gone.
Member

(either via a warning or deprecated-features)

The deprecated-features code was not designed to deprecated experimental features, in the assumption that removing experimental features should not need any formal deprecation by design and by contract.

> (either via a warning or deprecated-features) The deprecated-features code was not designed to deprecated experimental features, in the assumption that removing experimental features should not need any formal deprecation by design and by contract.
Member

Since I worked on a manual today, I noticed that we have a problem related to this deprecation. In the manual, glossary has definition for CA-drv, which also links to __contentAddressed attribute. We want to consider how we'll migrate the manual.

I think that we want to document __contentAddressed attribute for a few releases until we still expect it to be used (for migration of existing CA stores), but I'm not so sure about the glossary item. Do we remove it with __contentAddressed attribute when the time comes? Do we leave the glossary item forever, just change the definition to indicate that it's no longer used? Do we set up a link redirect to an older manual (like #809) that has this item?

Not a blocker, just something to consider.

Since I worked on a manual today, I noticed that we have a problem related to this deprecation. In the manual, glossary has [definition for CA-drv](https://docs.lix.systems/manual/lix/stable/glossary.html#gloss-content-addressed-derivation), which also links to [__contentAddressed attribute](https://docs.lix.systems/manual/lix/stable/language/advanced-attributes.html#adv-attr-__contentAddressed). We want to consider how we'll migrate the manual. I think that we want to document `__contentAddressed` attribute for a few releases until we still expect it to be used (for migration of existing CA stores), but I'm not so sure about the glossary item. Do we remove it with `__contentAddressed` attribute when the time comes? Do we leave the glossary item forever, just change the definition to indicate that it's no longer used? Do we set up a link redirect to an older manual (like #809) that has this item? Not a blocker, just something to consider.
Member

I'd keep the high-level descriptions of CA-derivations in the manual, and replace the implementation details with "we don't have an implementation currently, there is an implementation in Nix, maybe one day we'll have one again too"

I'd keep the high-level descriptions of CA-derivations in the manual, and replace the implementation details with "we don't have an implementation currently, there is an implementation in Nix, maybe one day we'll have one again too"
Owner

while looking into how to best remove ca derivations we naturally have to remove dynamic derivations first; this was a known constraint, but since dyn-drvs are functionally useless in their current state there's no loss here. we also have to remove impure derivations though since the impure derivation rewriting machinery also does not work without ca derivations. since they build on the ca infrastructure and inherit all its problems we will not get around removing them too.

while looking into how to best remove ca derivations we naturally have to remove dynamic derivations first; this was a known constraint, but since dyn-drvs are functionally useless in their current state there's no loss here. we *also* have to remove impure derivations though since the impure derivation rewriting machinery *also* does not work without ca derivations. since they build on the ca infrastructure and inherit all its problems we will not get around removing them too.
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/3087 ("deprecate CA, dynamic, and impure derivations")
  • commit message in cl/3118 ("deprecate CA, dynamic, and impure derivations")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/3087", "number": 3087, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/3118", "number": 3118, "kind": "commit message"}], "cl_meta": {"3087": {"change_title": "deprecate CA, dynamic, and impure derivations"}, "3118": {"change_title": "deprecate CA, dynamic, and impure derivations"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/3087](https://gerrit.lix.systems/c/lix/+/3087) ("deprecate CA, dynamic, and impure derivations") * commit message in [cl/3118](https://gerrit.lix.systems/c/lix/+/3118) ("deprecate CA, dynamic, and impure derivations")
Owner

@jade do you have any unspoken opinions? otherwise we'd merge the deprecation cl Soon™

@jade do you have any unspoken opinions? otherwise we'd merge the deprecation cl Soon™
Owner

lol sorry, due to a miscommunication, 2.93 is now out (since this was not marked as actually blocking). let's make 2.94 a less slow cycle than 2.93 so that this doesn't take too long.

lol sorry, due to a miscommunication, 2.93 is now out (since this was not marked as actually blocking). let's make 2.94 a less slow cycle than 2.93 so that this doesn't take too long.
Member

Could also backport it to 2.93.1 if there is a need for it.

Could also backport it to 2.93.1 if there is a need for it.

Is there a plan for a follow-up on impure derivations? I think there's a lot of potential for them, namely:

  • as a way of having side effects (think: a replacement of GitHub actions)
  • building Sphinx documentation with intersphinx, and external link checking
Is there a plan for a follow-up on impure derivations? I think there's a lot of potential for them, namely: - as a way of having side effects (think: a replacement of GitHub actions) - building Sphinx documentation with [intersphinx](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html), and external link checking
Owner

Is there a plan for a follow-up on impure derivations?

not at this point, though in the future we may get back to this. we currently have many things that need to be done before this is feasible.

if the side effects are what you're after and the output doesn't matter you can make your current impure derivation a fixed-output derivation that writes a known but impure value (eg unix time the eval was started) instead. for example this derivation will run from scratch every time it is built with a different value of d, eg provided via nix-build --argstr:

{ d }:

(import <nixpkgs> {}).runCommand "foo" {
  outputHashMode = "flat";
  outputHashAlgo = "sha256";
  outputHash = builtins.hashString "sha256" d;
} ''
  echo -n ${d} > $out
''
> Is there a plan for a follow-up on impure derivations? not at this point, though in the future we may get back to this. we currently have many things that need to be done before this is feasible. if the side effects are what you're after and the output doesn't matter you can make your current impure derivation a fixed-output derivation that writes a known but impure value (eg unix time the eval was started) instead. for example this derivation will run from scratch every time it is built with a different value of `d`, eg provided via `nix-build --argstr`: ```nix { d }: (import <nixpkgs> {}).runCommand "foo" { outputHashMode = "flat"; outputHashAlgo = "sha256"; outputHash = builtins.hashString "sha256" d; } '' echo -n ${d} > $out '' ```
Author
Owner

@minijackson wrote in #815 (comment):

Is there a plan for a follow-up on impure derivations? I think there's a lot of potential for them, namely:

  • as a way of having side effects (think: a replacement of GitHub actions)
  • building Sphinx documentation with intersphinx, and external link checking

Adding to the top of @pennae answers, I'd argue that Nix is not made for running side effects directly, almost all assumptions around are that you build things with a replicable shape, even if the input is something you pass impurely.

So, you can always consider a side-effect harness (à la https://github.com/hercules-ci/hercules-ci-effects) completely driven by Nix expressions, but you still have to handle your runtime at the end and if you actually enjoy the Nix runtime, it's really not that hard to throw bubblewrap or similar and obtain what you wanted.

As for the external link checking example, pennae's example is excellent here because d could be a snapshot of all the external documentation you are looking at and you could be running that derivation build on a regular interval. Most of the time, when external doc and source does not change, it does a cache hit. When one of the input changes, it recomputes. That's much better than keeping the external world implicit by impurely accessing it, IMHO.

If we had more resources, we could imagine writing more documentation and guidance on how to make all of these things that impure derivations made convenient even more powerful and as convenient (maybe more?) as before, using Lix external commands and some machinery like Hercules-CI effects or some examples.

@minijackson wrote in https://git.lix.systems/lix-project/lix/issues/815#issuecomment-12206: > Is there a plan for a follow-up on impure derivations? I think there's a lot of potential for them, namely: > > * as a way of having side effects (think: a replacement of GitHub actions) > * building Sphinx documentation with [intersphinx](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html), and external link checking Adding to the top of @pennae answers, I'd argue that Nix is not made for running side effects directly, almost all assumptions around are that you build things with a replicable shape, even if the input is something you pass impurely. So, you can always consider a side-effect harness (à la https://github.com/hercules-ci/hercules-ci-effects) completely driven by Nix expressions, but you still have to handle your runtime at the end and if you actually enjoy the Nix runtime, it's really not that hard to throw bubblewrap or similar and obtain what you wanted. As for the external link checking example, pennae's example is excellent here because `d` could be a snapshot of all the external documentation you are looking at and you could be running that derivation build on a regular interval. Most of the time, when external doc and source does not change, it does a cache hit. When one of the input changes, it recomputes. That's much better than keeping the external world implicit by impurely accessing it, IMHO. If we had more resources, we could imagine writing more documentation and guidance on how to make all of these things that impure derivations made convenient even more powerful and as convenient (maybe more?) as before, using Lix external commands and some machinery like Hercules-CI effects or some examples.

As for the external link checking example, pennae's example is excellent here because d could be a snapshot of all the external documentation you are looking at and you could be running that derivation build on a regular interval.

That sounds very complex to do, especially for documentation where you can have external links to any website.

I guess the feature that would be the most useful to me for these particular usecases, is specifying dependencies and build instructions, without the restriction of reproducibility. But thinking about it, it is achievable using nix develop, and since it doesn't put anything in the Nix store, the output not being reproducible is not an issue.

> As for the external link checking example, pennae's example is excellent here because d could be a snapshot of all the external documentation you are looking at and you could be running that derivation build on a regular interval. That sounds very complex to do, especially for documentation where you can have external links to any website. I guess the feature that would be the most useful to me for these particular usecases, is specifying dependencies and build instructions, without the restriction of reproducibility. But thinking about it, it is achievable using `nix develop`, and since it doesn't put anything in the Nix store, the output not being reproducible is not an issue.
Owner

We use nix run for this use case at work :)

We use nix run for this use case at work :)
Author
Owner

Removal is up in https://gerrit.lix.systems/c/lix/+/3187/2.

This is the final line and before sending this chain, I'd like to get confidence on this, so any extra +2 would be very welcome.

In the meantime, I decided to do more manual testing, also by consulting existing ca-derivations users in the wild.

Removal is up in https://gerrit.lix.systems/c/lix/+/3187/2. This is the final line and before sending this chain, I'd like to get confidence on this, so any extra +2 would be very welcome. In the meantime, I decided to do more manual testing, also by consulting existing ca-derivations users in the wild.

I use ca-derivations to speed up generating the man pages as in the option documentation: ecfe591593/nixos/modules/misc/man-db.nix (L45)

I use ca-derivations to speed up generating the man pages as in the option documentation: https://github.com/NixOS/nixpkgs/blob/ecfe591593fa74fe89febadbe331c4be1712378f/nixos/modules/misc/man-db.nix#L45
Author
Owner

@accelbread wrote in #815 (comment):

I use ca-derivations to speed up generating the man pages as in the option documentation: ecfe591593/nixos/modules/misc/man-db.nix (L45)

Could you provide some benchmarks w/o ca-derivations and w/ ca-derivations?

If this is a tremendous time saver, a solution could be to rewrite this logic to use IFD to emulate ca-derivations, which would ultimately benefit everyone.

@accelbread wrote in https://git.lix.systems/lix-project/lix/issues/815#issuecomment-12674: > I use ca-derivations to speed up generating the man pages as in the option documentation: [`ecfe591593/nixos/modules/misc/man-db.nix (L45)`](https://github.com/NixOS/nixpkgs/blob/ecfe591593fa74fe89febadbe331c4be1712378f/nixos/modules/misc/man-db.nix#L45) Could you provide some benchmarks w/o ca-derivations and w/ ca-derivations? If this is a tremendous time saver, a solution could be to rewrite this logic to use IFD to emulate ca-derivations, which would ultimately benefit everyone.

While this feature aimed to improve binary cache efficiency and promote bit-for-bit reproducibility, it has seen no significant adoption.

Well, being a carrier of the feature' adoption, I want to mention that it's good that you added the warning about its removal in Lix 2.93.1.

It's questionable however if the warning is timely; to some extent, it depends how many versions are planned before 2.94.

For my usage scenario I often sit at the edge of nixpkgs. Also I have quite huge custom DMGs in the cache, which I never want to fetch from there, except when something is broken.

Due to frequent nixpkg updates, DMG-based derivations get often rebuilt, and here ca-derivations feature comes to the rescue: it doesn't care if builder logic was somehow changed.

> While this feature aimed to improve binary cache efficiency and promote bit-for-bit reproducibility, it has seen no significant adoption. Well, being a carrier of the feature' adoption, I want to mention that it's good that you added the warning about its removal in Lix 2.93.1. It's questionable however if the warning is timely; to some extent, it depends how many versions are planned before 2.94. For my usage scenario I often sit at the edge of nixpkgs. Also I have quite huge custom DMGs in the cache, which I never want to fetch from there, except when something is broken. Due to frequent nixpkg updates, DMG-based derivations get often rebuilt, and here `ca-derivations` feature comes to the rescue: it doesn't care if builder logic was somehow changed.
Owner

FYI I'm planning on using output-addressed store paths with references and builtins.fetchClosure, assuming that the code is still in Lix. I don't care at all about the Nix builder supporting them as I'm not going to use it; what I'm up to is compiling software with buck2 and distributing it with Lix.

FYI I'm planning on using output-addressed store paths with references and `builtins.fetchClosure`, assuming that the code is still in Lix. I don't care at all about the Nix builder supporting them as I'm not going to use it; what I'm up to is compiling software with buck2 and distributing it with Lix.
Sign in to join this conversation.
No milestone
No project
No assignees
11 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#815
No description provided.