Stabilize cgroups experimental feature #1107

Closed
opened 2026-01-26 22:42:42 +00:00 by raito · 1 comment
Owner

This is a tracking issue to stabilize the cgroups feature, we collected some feedback from users living on HEAD with respect to our architectural changes.

Known issues

Escape hatch when cgroup delegation structure is incorrect (single user mode Nix)

cgroups are a UX trap for the time being, if you set use-cgroups = true and enable the cgroups experimental feature. Single user mode Nix may be dysfunctional because you don't live in the adequate delegation structure that systemd advise us to follow.

For this situation, you can use systemd-run -p Delegate=yes ....

Considered ideas to mitigate this:

  • Produce the cgroup tree ourselves instead of systemd: NACK because that's not the job we want to have.
  • Abandon cgroups when we fail to use them: NACK because that's a misconfiguration problem. We want to report clear errors that cgroups are misconfigured if users are planning to rely on them.
  • Modify use-cgroups to support more nuance: use-cgroups=if-possible: that still doesn't make sense for servers usecase where we want the systemd-ran Lix to have cgroups properly.
  • ... ?

nix copy almost always bypasses the daemon

nix copy often coupled with --store does not connect to the daemon to perform the operations and may end up doing builds as part of the user context, causing cgroups issues.

Important remarks

Due to the way cgroups work, we need to ensure that the supervisor cgroup is emptied properly for systemd to be able to restart the Lix daemon across upgrades. Historically, Lix ran as a monolith daemon spawning subdaemons. This meant that if any of the foreign program we use (e.g. ssh) had to keep a resource open after all subdaemons stopped, systemd cannot easily restart / reload the service while keeping the other subdaemons running because we have no way to terminate the other processes that spawned on the side in the meantime. This surfaced in #1030.

The "solution" (*) to this problem was to enable socket activation for Lix subdaemons and remove the concept of monolithic daemon (delegating it to systemd actually).

(*): it only fixed systemd restarts, the actual resource gets destroyed now everytime a subdaemon finishes and another is using the resource. This is worse in some ways.

This is a tracking issue to stabilize the `cgroups` feature, we collected some feedback from users living on HEAD with respect to our architectural changes. ### Known issues #### Escape hatch when cgroup delegation structure is incorrect (single user mode Nix) `cgroups` are a UX trap for the time being, if you set `use-cgroups = true` and enable the `cgroups` experimental feature. Single user mode Nix may be dysfunctional because you don't live in the adequate delegation structure that systemd advise us to follow. For this situation, you can use `systemd-run -p Delegate=yes ...`. Considered ideas to mitigate this: - Produce the cgroup tree ourselves instead of systemd: NACK because that's not the job we want to have. - Abandon cgroups when we fail to use them: NACK because that's a misconfiguration problem. We want to report clear errors that cgroups are misconfigured if users are planning to rely on them. - Modify `use-cgroups` to support more nuance: `use-cgroups=if-possible`: that still doesn't make sense for servers usecase where we want the systemd-ran Lix to have cgroups properly. - ... ? #### `nix copy` almost always bypasses the daemon `nix copy` often coupled with `--store` does not connect to the daemon to perform the operations and may end up doing builds as part of the user context, causing cgroups issues. ### Important remarks Due to the way cgroups work, we need to ensure that the supervisor cgroup is emptied properly for systemd to be able to restart the Lix daemon across upgrades. Historically, Lix ran as a monolith daemon spawning subdaemons. This meant that if any of the foreign program we use (e.g. ssh) had to keep a resource open after all subdaemons stopped, systemd cannot easily restart / reload the service while keeping the other subdaemons running because we have no way to terminate the other processes that spawned on the side _in the meantime_. This surfaced in https://git.lix.systems/lix-project/lix/issues/1030. The "solution" (*) to this problem was to enable socket activation for Lix subdaemons and remove the concept of monolithic daemon (delegating it to systemd actually). (*): it only fixed systemd restarts, the actual resource gets destroyed now everytime a subdaemon finishes and another is using the resource. This is worse in some ways.
raito self-assigned this 2026-01-26 23:34:26 +00:00
raito added this to the 2.95 milestone 2026-01-26 23:34:30 +00:00
Author
Owner

dupe of #537

dupe of https://git.lix.systems/lix-project/lix/issues/537
raito closed this issue 2026-01-27 02:11:23 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#1107
No description provided.