Protocol mismatch when copying to remote host running Nix 2.24 #644
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#644
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
I'm running Lix on my laptop, and deploy to other hosts using Colmena.
Today, I got the following error when trying to apply:
The remote side uses Nix 2.24.11, the laptop Lix 2.91.1. It looks like nix-copy-closure starts a nix-store --serve on the remote side, but stumbles hard the protocol being spoken there.
Running Colmena in an environment with Nix 2.24.11 in $PATH gets the copy to succeed.
Expected behavior
I'd expect nix-copy-closure etc to work, even when talking to NixOS systems running Nix 2.24.11.
nix --version
outputOn the laptop:
nix (Lix, like Nix) 2.91.1
On the remote side:
nix (Nix) 2.24.11
cc @raito as requested.
this is almost certainly a CppNix bug, because I really doubt we touched that code; can you try reproducing it on 2.18 on the client side? cc @roberth
I switched the machine to run Lix by temporarily shelling in Nix 2.24 client-side, and even after switching back and forth I wasn't able to trigger it anymore. Not sure why. In case noone else is able to reproduce this, or has an idea what's going on feel free to close this.
I saw this today again, with both client and server using
nix (Lix, like Nix) 2.91.1
.This time, is was a
colmena apply
to another host (withdeployment.buildOnTarget = true;
set to true):try turning off ssh connection multiplexing, that's known to be busted in weird and wonderful ways
Wouldn't it be a good idea for Lix to automatically set
ControlMaster=no
whenever we specify a connection over ssh?we could do that, but the muxing code is sufficiently broken that we should remove lix-directed muxing entirely instead. the error you're seeing here is not caused by lix itself using muxing, but by another process on the same system using muxing. the only reasonable way forward seems to be ripping out our mux handling entirely and becoming mux-agnostic
This issue was mentioned on Gerrit on the following CLs: