Stabilize cgroups experimental feature #1107
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#1107
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is a tracking issue to stabilize the
cgroupsfeature, we collected some feedback from users living on HEAD with respect to our architectural changes.Known issues
Escape hatch when cgroup delegation structure is incorrect (single user mode Nix)
cgroupsare a UX trap for the time being, if you setuse-cgroups = trueand enable thecgroupsexperimental feature. Single user mode Nix may be dysfunctional because you don't live in the adequate delegation structure that systemd advise us to follow.For this situation, you can use
systemd-run -p Delegate=yes ....Considered ideas to mitigate this:
use-cgroupsto support more nuance:use-cgroups=if-possible: that still doesn't make sense for servers usecase where we want the systemd-ran Lix to have cgroups properly.nix copyalmost always bypasses the daemonnix copyoften coupled with--storedoes not connect to the daemon to perform the operations and may end up doing builds as part of the user context, causing cgroups issues.Important remarks
Due to the way cgroups work, we need to ensure that the supervisor cgroup is emptied properly for systemd to be able to restart the Lix daemon across upgrades. Historically, Lix ran as a monolith daemon spawning subdaemons. This meant that if any of the foreign program we use (e.g. ssh) had to keep a resource open after all subdaemons stopped, systemd cannot easily restart / reload the service while keeping the other subdaemons running because we have no way to terminate the other processes that spawned on the side in the meantime. This surfaced in #1030.
The "solution" (*) to this problem was to enable socket activation for Lix subdaemons and remove the concept of monolithic daemon (delegating it to systemd actually).
(*): it only fixed systemd restarts, the actual resource gets destroyed now everytime a subdaemon finishes and another is using the resource. This is worse in some ways.
dupe of #537