Lix is hitting fetcher-cache-v1.sqlite too hard under mass concurrency #1122
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
awaiting
author
awaiting
contributors
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
Feature/S3
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lix-project/lix#1122
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
When Lix is fetching 100s of copies of the same Flake input, the SQLite cache (
fetcher-cache-v1.sqlite) is put under massive pressure and can throw errors in various places of the fetching code. This lead to fatal failure when it could be retried or paused gracefully.Steps To Reproduce
nix-shell -p lixPackageSets.latest.nix-eval-jobsnix-eval-jobs --gc-roots-dir /tmp/somewhere/gcroots --force-recurse --max-memory-size 4096 --workers 96 --flake "git+file://$(pwd)?rev=56988d860593a5fd8153d02a0ca5469508378626#hydraJobs"The exact number of workers is not a rocket science, you need enough concurrency but just below the nr that cause the daemon to reject your connections. 96 on my AMD Ryzen 9 7900X 12-Core Processor cause it to occur.
Expected behavior
Retries or self-pacing.
nix --versionoutputReported to occur on 2.94.0 by @lheckemann
Reproduced using nix-eval-jobs from 2.94.0, the code that runs the Flake fetching is independent from the daemon (I believe?), so 2.94.0.
Additional context
I believe that the error occur exactly here:
This issue was mentioned on Gerrit on the following CLs: