Lix broken during nixos-rebuild #883
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#883
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
I have two machines (one x64 and one aarch64) that appear to have borked their Lix installations during
nixos-rebuild
. The journals of their most recent attempts to runnixos-upgrade
are attached. In both cases, Lix appears to have entered a corrupt state during the upgrade, as at the end of each upgrade, Lix complains that it is unable to findlibeditline
. There appear to have been no errors prior to the upgrade attempt.After the upgrade attempt, Lix on both hosts fails with the error
nix: error while loading shared libraries: libeditline.so.1: cannot open shared object file: No such file or directory
.Taking one of the hosts as an example, we find that the
nix
binary has in its rpath thelib
folder of the same derivation and thatliblixcmd.so
has alibeditline
output in its rpath. However, thelibeditline
output no longer exists:The relevant output does not appear to have been invalidly garbage collected, leading me to believe the error is, in fact, occurring during
nixos-rebuild
:(This
editline
output first became referenced by the Lix derivation after this garbage-collection run, later on June 23.)The result is that the store is corrupt, with an extant output referencing an output that has been somehow deleted.
My nix configurations are available at https://github.com/randomnetcat/nix-configs if this helps with reproduction.
I apologize that I don't have anything more specific to add, but I do not have time to attempt to reproduce this issue in, e.g., a VM. However, I am happy to answer any questions.
Steps To Reproduce
I... have no idea.
Expected behavior
Lix does not break during
nixos-rebuild
.nix --version
outputOutput from a different (working) machine with the same version of Lix:
Can you share more information about what filesystem for the store do you have on the machine that exhibit the bug?
yep, this is definitely a bug in lix. it happens when a derivation that already has some valid output on the system is rebuilt for its other outputs, in this case the other outputs are added to the store and the pre-existing outputs are deleted due to a logic error. quite bad.
since the problematic code was backported to all stable releases as part of the CVE fix chain it's all but guaranteed that stable is affected as well
We will perform an emergency release (probably with other papercut fixes) to fix that.
Additionally, we will offer guidance and tools to repair systems that might have been broken by this.
I just ran into this. I was able to fix it by setting
LD_PRELOAD
to some existinglibeditline.so.1
in my Nix store, and running Lix as root to bypass the daemon. All in all:(The
--store local
was unnecessary of course, but I'm putting it here for history since #18)This issue was mentioned on Gerrit on the following CLs:
@qyriad wrote in #883 (comment):
A previous Lix would probably work as well if you still have one in the past. If you can create a new section in the blog post for the CVE to inform about this known issue and provide these recommendations in the blog post, that'd be awesome!
We will be releasing the fixes ASAP for this as part of:
Thank you for bearing with us and sorry for the inconveniences.
Thank you all for the very quick fix!
Hey, I've ran into the same issue. I have one machine that does all the builds, and other machines just pull their built configs, apparently only the build host had problems.
This is the commit that caused the breakage (all configs and pipeline logs should be publicly visible, let me know if not):
e621f23afd
Not sure if it's just a case of "Lix version with the bug built version without the bug, so the build was borked, and there's nothing here to see", or something worth looking into.
The
LD_PRELOAD
hack with--repair-path
that @qyriad shared worked to recover, so now my system is back and running.Please let me know if there's any more information that you need. Thanks!
@raito wrote in #883 (comment):
The fixes are now released! Thank you for your patience.