Upgrading an existing install deleted nix-daemon.service and broke everything #1189
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
diagnostics
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
Importance
High
Importance
Low
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
Release Blocking
Non-urgent
Release Blocking
Urgent
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
Urgency
High
Urgency
Low
ux
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#1189
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
TL;DR: I tried to upgrade a 2.95.1 Lix install to 2.95.1 (yes, i'm stupid), and it failed in the middle of
nix-enving the new install, after having uninstalled the oldnixandnix-daemon.service, completely breaking my Lix install. I had to completely uninstall nix using/nix/nix-installer uninstallerand then reinstall everything.Steps To Reproduce
maybe I'm stupid and did something wrong, but running
on an existing 2.95.1 (i forgot i'd already upgraded) config uninstalled (among other things)
nix-daemon.serviceand then was unable to start it again, completely breaking everything. this also means i don't have nix in my path anymore, so i can't take the "upgrade" path anymore, and instead have to reinstall everything. (i tried to justrm /nix/receipt.json && /nix/nix-installer install, but for some reason it fails, complaining about eithererror: you don't have sufficient rights to use this commandorerror: could not connect to any lix socket (tried /nix/var/nix/daemon-socket/socket)when trying to--load-db.)Here is the original installer error, after which nix was unusable:
For completeness, I've also attached the full terminal session as
nix-install.log(yes, there's a lot of repetition in there, sorry ;-;)Expected behavior
Either the upgrade is idempotent or it just refuses to upgrade because it detects that it is already the same version. In any case, it doesn't... explode.
nix --versionoutputN/A, but i'm on an x86_64 Ubuntu 24.04.4
Additional context
Add any other context about the problem here.
(Opened from lix-project/lix-installer#79)
In my test container, I was able to recover with:
Where
profile-1-linkwas the previous generation:Not exactly an obvious recovery path, but at least the previous profile is intact.
iirc,
/nix/var/nix/profileswas basically empty for me except for amanifest.nixsomewhere (probably indefault/, though i could be misremembering). i don't remember ifper-user/root/profileexisted or if i checked it, but if you're seeing it in your tests i probably just missed it. honestly i was a little panicked given the installer's failures and the fact the daemon service was gone, so i just thought the easiest path would be uninstalling (and i didn't even know about/nix/nix-installer repair;-;)Ah, I see what's happening.
nix upgrade-nixcallsnix-envto remove the old profile, which removes/nix/var/nix/profiles/default/lib/systemd/system/nix-daemon{@.service,.socket}. Then it callsnix-envagain to install the new profile, which tries to connect to the runningnix-daemon.socketand create a new instance ofnix-daemon@.service, except now/etc/systemd/nix-daemon{@.service,.socket}are symlinks with invalid targets, so the daemon connection fails.In other words, now that we are fully socket activated,
nix upgrade-nixremoves the ability to use the daemon partway through.cc @pennae
I know that we want to remove
upgrade-nixanyway, because, well, stuff like this. Should we make it executenix-envwith--store localor something, though, to prevent this failure mode?absolutely! upgrading while the daemon is still running and might be in use by something else is a recipe for disaster, especially if that upgrade changes the sqlite db schema (which luckily hasn't happened in a long time now). upgrades should take down the daemon until they're done just to be safe