Connect attempts should progress in an exponential backoff manner #932
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#932
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Since
7359c39076
our connect timeouts are 5s.Lix have exponential backoff for retrying requests, but not for the values of our timeouts that we pass onto curl.
This is the root cause behind #920.
To solve this, with @pennae, we suggest:
(1) Introduce
initialConnectTimeout
as a setting, set it to 1s or 5s, a low value.(2) Deprecate
connectTimeout
and rename it tomaxConnectTimeout
as a setting and bump it to the previous value or a reasonably high value.(3) Introduce a backoff logic flowing from
TransferStream
toTransferItem
, i.e.TransferStream
computes the actual connect timeout value usingattempts
,tries
,initialConnectTimeout
andmaxConnectTimeout
and some parameters for jitter which should be fixed for now and pass them on toTransferItem
which sets them viacurl_easy_setopt
.Extra caution should be paid to the NixOS tests and ensuring they pass
--offline
as much as possible not to make them super slow or we should use very low values of timeouts in our non-networked NixOS tests.This issue was mentioned on Gerrit on the following CLs: