[Nix#5118] Substituter query parallelism is far too low #139
Labels
No labels
Area/build-packaging
Area/evaluator
Area/flakes
Area/language
Area/profiles
Area/releng
Area/remote-builds
Area/repl
Area/store
bug
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
RFD
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
ux
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#139
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Upstream-Issue: NixOS/nix#5118
TL;DR: a single-line change can make binary cache querying 50x as fast
Continuing from #5109, I decided to investigate why substitution (querying) is/seems to be fairly slow. Okay, maybe my 800 derivations are simply a bit much, I thought, but then I found @edolstra's commit message stating that HTTP/2 should handle this in under a second. So what's going on? The async API described in that commit never really got put to use as far as I can see, so it turns out that throughput is (severely) limited by the size of the thread pool used to fetch path infos in parallel via the synchronous API. The standard size of that pool is the number of hardware threads (8 on this machine), but we can improve throughput by >50x by massively scaling up that pool, at least in my use case!
Now I can't talk about any substituters other than HTTP/2 binary caches and I don't know how exactly this should be fixed (should the pool size be a new option (with a high default or low default?), should the thread pool be replaced with true asynchrony, should doing the same for actual substitution be investigated as well, should substitution querying and actual substitution be merged after all, ...), but obviously I would be very happy about any solution removing this bottleneck and would be willing to implement it when there is consensus on the design.
...This feels like it shouldn't be solved by just runinng 1000 threads, but rather a complete redesign of the infrastructure that does this. We should easily be able to hit this many requests without this kind of hack.