Non-locking inputs #958

Open
opened 2025-08-12 22:59:28 +00:00 by irenes · 11 comments
Member

For some time, I've been contemplating how my goals for secrets management are at odds with the nix stateless model, and what I might do about that. Secrets are an inherently stateful thing, and my personal ideal security model would have them live only on the machine where they're used. In particular, I dislike the approach of checking secrets into source control.

This problem existed before flakes, but flakes make it more urgent because every input must be explicitly declared and direct filesystem access is blocked, which means ad-hoc mechanisms such as pointing to a file in a special location no longer work.

Describe the solution you'd like

I recently hit on an approach that I'm excited about due to its minimalism: Allow certain flake inputs to be designated as non-locking. That is, I'd create a new per-input flag, and inputs with that flag set would never appear in the lock file. Thus you could define a "secrets" input which points to a designated directory and the contents of that directory could be different on every machine. The evaluation semantics would still be pure, it just wouldn't save the hash - it'd be treated the same as a new input that hadn't yet been locked.

One noteworthy drawback of this approach is that, because it leverages state which exists only on the target machine, derivations relying on non-locking inputs wouldn't be able to be built remotely. This would interfere with tools such as Colmena, in their present form. Therefore, the feature should be designed such that tools doing remote builds can identify which derivations these are, and make sure they're built on the machine itself, even if everything else is built remotely. This might have implications for the name of the flag.

Another drawback is that an attacker with root access to the machine could modify the secrets in-place and nix wouldn't detect it. However, an attacker with that access could also change the nix configuration, so I don't see it as a practical concern.

Describe alternatives you've considered

I've explored several other possible directions, most notably impure derivations. Unfortunately, impure derivations in their present form can't be used for this because they're only for fixed outputs. The nix evaluation model also doesn't have a way to enforce constraints on only some parts of an evaluation and not others. While I still think a carve-out in the semantics of pure evaluation could be worthwhile, there's too much theory work required to actually implement it, and I suspect nobody would be willing to change the language that much just for this use-case.

There's a related problem of code signing, which involves credentials that exist outside nix but must be provided at build-time. It would be nice to have a single language tweak that addresses both problems, but I haven't been able to find one that's sufficiently minimal.

Additional context

I'm offering to do this work, if it sounds like something the project would be willing to accept. I've created this issue thread to assess that, because I don't want to waste everyone's time if it's not a feature that's aligned with the project's goals or if there are no resources to review it.

My hope is to start a discussion; thanks in advance!

## Is your feature request related to a problem? Please describe. For some time, I've been contemplating how my goals for secrets management are at odds with the nix stateless model, and what I might do about that. Secrets are an inherently stateful thing, and my personal ideal security model would have them live only on the machine where they're used. In particular, I dislike the approach of checking secrets into source control. This problem existed before flakes, but flakes make it more urgent because every input must be explicitly declared and direct filesystem access is blocked, which means ad-hoc mechanisms such as pointing to a file in a special location no longer work. ## Describe the solution you'd like I recently hit on an approach that I'm excited about due to its minimalism: Allow certain flake inputs to be designated as non-locking. That is, I'd create a new per-input flag, and inputs with that flag set would never appear in the lock file. Thus you could define a "secrets" input which points to a designated directory and the contents of that directory could be different on every machine. The evaluation semantics would still be pure, it just wouldn't save the hash - it'd be treated the same as a new input that hadn't yet been locked. One noteworthy drawback of this approach is that, because it leverages state which exists only on the target machine, derivations relying on non-locking inputs wouldn't be able to be built remotely. This would interfere with tools such as Colmena, in their present form. Therefore, the feature should be designed such that tools doing remote builds can identify which derivations these are, and make sure they're built on the machine itself, even if everything else is built remotely. This might have implications for the name of the flag. Another drawback is that an attacker with root access to the machine could modify the secrets in-place and nix wouldn't detect it. However, an attacker with that access could also change the nix configuration, so I don't see it as a practical concern. ## Describe alternatives you've considered I've explored several other possible directions, most notably impure derivations. Unfortunately, impure derivations in their present form can't be used for this because they're only for fixed outputs. The nix evaluation model also doesn't have a way to enforce constraints on only some parts of an evaluation and not others. While I still think a carve-out in the semantics of pure evaluation could be worthwhile, there's too much theory work required to actually implement it, and I suspect nobody would be willing to change the language that much just for this use-case. There's a related problem of code signing, which involves credentials that exist outside nix but must be provided at build-time. It would be nice to have a single language tweak that addresses both problems, but I haven't been able to find one that's sufficiently minimal. ## Additional context I'm offering to do this work, if it sounds like something the project would be willing to accept. I've created this issue thread to assess that, because I don't want to waste everyone's time if it's not a feature that's aligned with the project's goals or if there are no resources to review it. My hope is to start a discussion; thanks in advance!
Owner

to be honest we do not understand what you are asking for. non-locking inputs would still copy the entire input to the store, which completely undermines secrecy (besides already existing in a minor form as --override-input). going further and requiring that certain flake inputs never end up in the store sends you down the lazy-trees rabbit hole.

besides that we only see two classes of secret use to consider: use during derivation build, and use after derivation build. the latter is mostly a variation on e.g. agenix that doesn't keep secrets in source control but just holds a registry of them and checks whether they're in place on system activation, and the former is something we want to do anyway but currently can't because the wire protocols are very very bad. if you want to be absolutely certain that certain state never leaves a machine there's extra-sandbox-paths already as well.

apart from the use case of getting secrets into a build sandbox without needing them persisted in the file system anywhere we do not see anything actionable here.

to be honest we do not understand what you are asking for. non-locking inputs would still copy the entire input to the store, which completely undermines secrecy (besides already existing in a minor form as `--override-input`). going further and requiring that certain flake inputs never end up in the store sends you down the lazy-trees rabbit hole. besides that we only see two classes of secret use to consider: use *during* derivation build, and use *after* derivation build. the latter is mostly a variation on e.g. agenix that doesn't keep secrets in source control but just holds a registry of them and checks whether they're in place on system activation, and the former is something we want to do anyway but currently can't because the wire protocols are very very bad. if you want to be absolutely certain that certain state never leaves a machine there's `extra-sandbox-paths` already as well. apart from the use case of getting secrets into a build sandbox without needing them persisted in the file system anywhere we do not see anything actionable here.
Author
Member

the use-case is to have a flake which doesn't need to pin the secrets, only know how to find them at build-time. this is important if, for example, you have a config repo that has several machine configurations in it but each machine only has its own secrets stored locally. to do that in flakes as they are currently implemented, each machine would need to have a separate secrets input listed, and code reuse would be greatly hindered.

for clarity's sake: this is to include both secrets used during build, and secrets used after build.

agenix is a way to check secrets into source control while sharing them across machines, which is not what I want, even though I understand that it has an encryption model which some people are happy with. different situations call for different security models and agenix does not satisfy my needs or the needs of any group whose nix infrastructure I'm involved with running. it's nice that people like it, I won't tell them they shouldn't, it just is a very specific model based on a very specific set of ideas that aren't for everybody.

avoiding copying things to the store is a non-requirement for the scenario I care about, though, again, I understand why other people are concerned about it. it was not my intention to suggest anything about copying or the lack thereof. copying secrets to the store is primarily relevant to local privilege escalation, which to me is a low-priority concern because I assume that it's the easy part for most attackers anyway. avoiding copying would be nice but my assessment is that there are other substantial changes that would need to be made to get there, and it's not my goal right now.

similarly, the challenge with remote builds is not that the naive approach would lead to unexpected copying; it wouldn't. the naive approach would simply fail because the files wouldn't be found. automated copying or the lack thereof is not relevant to the use-case I'm interested in.

the use-case is to have a flake which doesn't need to pin the secrets, only know how to find them at build-time. this is important if, for example, you have a config repo that has several machine configurations in it but each machine only has its own secrets stored locally. to do that in flakes as they are currently implemented, each machine would need to have a separate secrets input listed, and code reuse would be greatly hindered. for clarity's sake: this is to include both secrets used during build, and secrets used after build. agenix is a way to check secrets into source control while sharing them across machines, which is not what I want, even though I understand that it has an encryption model which some people are happy with. different situations call for different security models and agenix does not satisfy my needs or the needs of any group whose nix infrastructure I'm involved with running. it's nice that people like it, I won't tell them they shouldn't, it just is a very specific model based on a very specific set of ideas that aren't for everybody. avoiding copying things to the store is a non-requirement for the scenario I care about, though, again, I understand why other people are concerned about it. it was not my intention to suggest anything about copying or the lack thereof. copying secrets to the store is primarily relevant to local privilege escalation, which to me is a low-priority concern because I assume that it's the easy part for most attackers anyway. avoiding copying would be nice but my assessment is that there are other substantial changes that would need to be made to get there, and it's not my goal right now. similarly, the challenge with remote builds is not that the naive approach would lead to unexpected copying; it wouldn't. the naive approach would simply fail because the files wouldn't be found. automated copying or the lack thereof is not relevant to the use-case I'm interested in.
Owner

so you're really just asking for an impure input type that has nothing to with secrets as such, you're just intending to use it for secret management in your own config? that's fair. flakes are incredibly broken without that already though, and adding it will not only undermine the entire premise on which flakes are built, it'll also make them that much more awful to deal with for it. ioo flakes are already a solution in search of a problem, and your case seems to be hitting that harder than usually? or asking another way, why must it be flakes instead of a different pinning solution like npins?

agenix is a way to check secrets into source control while sharing them across machines [...]

not how we meant that; the important part of agenix is its registry of secret names, not the actual secrets themselves. providing the secrets backing this registry can be done in any number of ways, from checking that the paths exist in an activation script to decrypting them via TPM shenanigans or pulling them from a remote server with machine attestation or somehow else.

so you're really just asking for an impure input type that has nothing to with secrets as such, you're just intending to use it for secret management in your own config? that's fair. flakes are incredibly broken without that already though, and adding it will not only undermine the entire premise on which flakes are built, it'll also make them that much more awful to deal with for it. ioo flakes are already a solution in search of a problem, and your case seems to be hitting that harder than usually? or asking another way, why must it be flakes instead of a different pinning solution like npins? > agenix is a way to check secrets into source control while sharing them across machines [...] not how we meant that; the important part of agenix is its registry of secret names, not the actual secrets themselves. providing the secrets backing this registry can be done in any number of ways, from checking that the paths exist in an activation script to decrypting them via TPM shenanigans or pulling them from a remote server with machine attestation or somehow else.
Author
Member

well, it's only that flakes are well-documented and easy to teach people about, and have several CLI features that are convenient and that I enjoy using. and, like, they are still lix's official-ish recommendation, unless I've missed something? it's true that if I didn't want to use flakes, the problem would go away.

it took me a long time to warm up to flakes but I do think there is something to be said for having a standard-ish format to declare imports and exports of various kinds, at the repo level. I will be the first to start talking about the whole legibility-to-capital thing and how I don't want flakes to become a tool to npm-ify the nix ecosystem, and there are lots of small improvements I'd love to make to their features, but I do think that the external interface to them is a pretty good first pass.

I do think this change is minimally invasive, since it would be done as a flag in the input declaration, and all it does is turn off a piece of existing behavior.

in regard to agenix, again, I'm not seeking to convince you to stop using it. I just don't want to use it. I'm suggesting a feature that would help me as a user of lix, and offering to do the work to implement it. if the answer is that lix is only interested in features that the lix core team want to use, that's fine, I'm happy to accept that, just say so.

I'm not actually understanding how agenix does the stuff you're saying, it doesn't seem like those are documented features. my belief from previous attempts to poke at it is that the repository that stores the agenix secrets needs to be listed as a flake input (if you're using flakes), and that it's evaluated in the sandbox (like all flake evaluations) and can therefore only see things that are part of its own contents.

if I'm wrong about that, great, I'll be happy to learn, but agenix also does a bunch of other stuff that I see as more of a liability than an asset, and I'm not interested in using agenix. I don't personally think the bar for new features should be that there is no conceivable alternative, but if you do really want me to give a full analysis of agenix, I can try to do that... I just think it would be a large topic in its own right and really beside the point.

well, it's only that flakes are well-documented and easy to teach people about, and have several CLI features that are convenient and that I enjoy using. and, like, they are still lix's official-ish recommendation, unless I've missed something? it's true that if I didn't want to use flakes, the problem would go away. it took me a long time to warm up to flakes but I do think there is something to be said for having a standard-ish format to declare imports and exports of various kinds, at the repo level. I will be the first to start talking about the whole legibility-to-capital thing and how I don't want flakes to become a tool to npm-ify the nix ecosystem, and there are lots of small improvements I'd love to make to their features, but I do think that the external interface to them is a pretty good first pass. I do think this change is minimally invasive, since it would be done as a flag in the input declaration, and all it does is turn off a piece of existing behavior. in regard to agenix, again, I'm not seeking to convince you to stop using it. I just don't want to use it. I'm suggesting a feature that would help me as a user of lix, and offering to do the work to implement it. if the answer is that lix is only interested in features that the lix core team want to use, that's fine, I'm happy to accept that, just say so. I'm not actually understanding how agenix does the stuff you're saying, it doesn't seem like those are documented features. my belief from previous attempts to poke at it is that the repository that stores the agenix secrets needs to be listed as a flake input (if you're using flakes), and that it's evaluated in the sandbox (like all flake evaluations) and can therefore only see things that are part of its own contents. if I'm wrong about that, great, I'll be happy to learn, but agenix also does a bunch of other stuff that I see as more of a liability than an asset, and I'm not interested in using agenix. I don't personally think the bar for new features should be that there is no conceivable alternative, but if you do really want me to give a full analysis of agenix, I can try to do that... I just think it would be a large topic in its own right and really beside the point.
Author
Member

sorry, I was unclear: it is further my belief that the requirement for agenix to only access data that's part of the flake or its inputs, means that the TPM and activation-script options you suggest are impossible. in fact, I don't see how agenix could possibly do that, given the flake sandboxing semantics.

my apologies for not saying that explicitly up-thread.

sorry, I was unclear: it is further my belief that the requirement for agenix to only access data that's part of the flake or its inputs, means that the TPM and activation-script options you suggest are impossible. in fact, I don't see how agenix could possibly do that, given the flake sandboxing semantics. my apologies for not saying that explicitly up-thread.
Owner

we're not trying to sell you on agenix or any other of these tools. but we also don't see how your specific problem is not already solved by --impure --restrict-eval -I secrets=$machine_local_path (to get only that specific machine-local path into eval) or, for those machines that don't need secrets in build or eval, a registry that specifies which secret should be there so that an activation script can check that they are there (or report an error if some are missing).

we personally do not want to add this feature. we'd like to replace flakes with a more generic mechanism that can be used to implement flakes as they are today, but also other things like using npins as a backing, or inputs that don't go into lockfiles. you can probably add unlocked inputs with a plugin, but we personally don't want to add more input types to flakes before we have a way out of the libfetchers mess. other core team members may disagree.

we're not trying to sell you on agenix or any other of these tools. but we also don't see how your specific problem is not already solved by `--impure --restrict-eval -I secrets=$machine_local_path` (to get only that specific machine-local path into eval) or, for those machines that don't need secrets in build or eval, a registry that specifies which secret *should* be there so that an activation script can check that they *are* there (or report an error if some are missing). we *personally* do not want to add this feature. we'd like to replace flakes with a more generic mechanism that can be used to *implement* flakes as they are today, but also other things like using npins as a backing, or inputs that don't go into lockfiles. you can probably add unlocked inputs with a plugin, but we *personally* don't want to add more input types to flakes before we have a way out of the libfetchers mess. other core team members may disagree.
Author
Member

I agree that passing --impure and related flags every time will solve the problem, it's just arduous operationally. I'd prefer to be able to encode what parts of the system get pinned and what parts don't into the config repo itself, so that I don't have to maintain a wrapper script or ask people to memorize a lengthy command line. I'd prefer for the simple, unadorned rebuild commands that all the documentation describes to be all that's needed.

I would also very much like to see flakes replaced with a more generic mechanism. I'd love to help out with that however I can. it does seem like a long-term undertaking. meanwhile, I am responsible for quite a bit of stuff that relies on flakes, and I would like to not need to apply horrible hacks on every single redeploy.

you raise a good point about fetchers. I wasn't thinking that this would affect the fetcher, or count as a new input type. I was thinking of finding the code that writes out the lockfile and overriding it to not do that, based on an option that all input types would share. I haven't dug into the internals yet, and perhaps it's more entangled with fetchers than I realize?

my hope is that it will be a nice easy win which will give me a chance to work in the flake internals a bit, and be a learning path towards helping out in larger ways.

I agree that passing `--impure` and related flags every time will solve the problem, it's just arduous operationally. I'd prefer to be able to encode what parts of the system get pinned and what parts don't into the config repo itself, so that I don't have to maintain a wrapper script or ask people to memorize a lengthy command line. I'd prefer for the simple, unadorned rebuild commands that all the documentation describes to be all that's needed. I would also very much like to see flakes replaced with a more generic mechanism. I'd love to help out with that however I can. it does seem like a long-term undertaking. meanwhile, I am responsible for quite a bit of stuff that relies on flakes, and I would like to not need to apply horrible hacks on every single redeploy. you raise a good point about fetchers. I wasn't thinking that this would affect the fetcher, or count as a new input type. I was thinking of finding the code that writes out the lockfile and overriding it to not do that, based on an option that all input types would share. I haven't dug into the internals yet, and perhaps it's more entangled with fetchers than I realize? my hope is that it will be a nice easy win which will give me a chance to work in the flake internals a bit, and be a learning path towards helping out in larger ways.
Owner

Put that stuff in a script to run nixos-rebuild, then you have your problem solved.

In terms of actually implementing it, I also don't especially want to be touching libfetchers, particularly if it is deliberately introducing incompatibilities with cppnix which then become our responsibility to standardize properly, for what is a feature which contradicts the design intent of flakes and for which I can't think of any other use cases than deliberately poking holes in the purity of flakes (in order to copy stuff into the store which probably shouldn't even be in the store!).

If you want to use an impure path without throwing it in a script, you can do this at runtime instead of build time by simply putting the path to the thing into the flake as a string rather than a path object. This is the underlying mechanism through which agenix is implemented and is completely unaffected by flakes or not. It's still Linux, you can still just read files from paths inside some daemon or whatever.

Put that stuff in a script to run nixos-rebuild, then you have your problem solved. In terms of actually implementing it, I also don't especially want to be touching libfetchers, *particularly* if it is deliberately introducing incompatibilities with cppnix which then become our responsibility to standardize properly, for what is a feature which contradicts the design intent of flakes and for which I can't think of any other use cases than deliberately poking holes in the purity of flakes (in order to copy stuff into the store which probably shouldn't even be in the store!). If you want to use an impure path without throwing it in a script, you can do this at runtime instead of build time by simply putting the path to the thing into the flake as a string rather than a path object. This is the underlying mechanism through which agenix is implemented and is completely unaffected by flakes or not. It's still Linux, you can still just read files from paths inside some daemon or whatever.
Owner

Basically the central part of this feature request if I understand it correctly is "I don't want to pass --impure to read random paths from the filesystem in flakes, assuming they're declared", but you can just .. pass --impure. It won't make kittens cry or something.

Or you can move reading those paths into runtime and not use them at eval time at all and not need any workaround. Or you can get the exact thing you want for unlocked inputs with --override-input foo /bar --no-write-lock-file, iirc. Any of those work today.

Basically the central part of this feature request if I understand it correctly is "I don't want to pass --impure to read random paths from the filesystem in flakes, assuming they're declared", but you can just .. pass --impure. It won't make kittens cry or something. Or you can move reading those paths into runtime and not use them at eval time at all and not need any workaround. Or you can get the exact thing you want for unlocked inputs with `--override-input foo /bar --no-write-lock-file`, iirc. Any of those work today.
Author
Member

That is correct - I want a way to do it that's part of the declarative configuration, rather than being something I have to memorize (and, more importantly, teach and document). I appreciate that this is not a pain point for everyone.

I've looked at the code a bit and I think this might be doable with only changes to libexpr/flake, no touching libfetchers. Are you open to such a solution, if I can make it work?

That is correct - I want a way to do it that's part of the declarative configuration, rather than being something I have to memorize (and, more importantly, teach and document). I appreciate that this is not a pain point for everyone. I've looked at the code a bit and I think this might be doable with only changes to `libexpr/flake`, no touching `libfetchers`. Are you open to such a solution, if I can make it work?
Author
Member

Um, I'm going to investigate the string loophole first though, thanks. I had tried that in the past and moved away from it, for reasons I don't immediately remember.

Um, I'm going to investigate the string loophole first though, thanks. I had tried that in the past and moved away from it, for reasons I don't immediately remember.
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#958
No description provided.