SSH connection dropping when copying outputs should not abort the entire build tree #922
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#922
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When copying outputs from a remote builder on a bad connection, and the connection drops midway, the entire build tree is aborted with "truncated NAR encountered", followed by "some outputs are unexpectedly invalid".
The actual error is fine - the connection did drop, and we can't recover from that; however, killing all the concurrently running builds is bad.
duplicate of #878?
Nope, this kills everything even with --keep-going.
well thank fuck this codebase has historically been a perfect fount of consistent behavior
re:
E/reproducible
, I don't think anyone of us succeded into doing a repro or having a clear reproducer. Until we obtain this, this is going to be hard to action.this may just be #928
@k900 tried to reproduce today and failed at reproduction, I don't know if they were using the #928 fixed revision or not.
Seen on
nix (Lix, like Nix) 2.94.0-devpre20250723_020751c
.The output on the remote builder wasn't written, but the coordinator was convinced it could pull it.
Had to
--repair
store paths on a number of machines, now waiting if it reappears.