[Nix#5118] Substituter query parallelism is far too low #139

Open
opened 2024-03-16 06:45:07 +00:00 by lix-bot · 1 comment
Member

Upstream-Issue: NixOS/nix#5118

TL;DR: a single-line change can make binary cache querying 50x as fast

Continuing from #5109, I decided to investigate why substitution (querying) is/seems to be fairly slow. Okay, maybe my 800 derivations are simply a bit much, I thought, but then I found @edolstra's commit message stating that HTTP/2 should handle this in under a second. So what's going on? The async API described in that commit never really got put to use as far as I can see, so it turns out that throughput is (severely) limited by the size of the thread pool used to fetch path infos in parallel via the synchronous API. The standard size of that pool is the number of hardware threads (8 on this machine), but we can improve throughput by >50x by massively scaling up that pool, at least in my use case!

#threads "querying missing paths"
8 38s
32 10s
100 1.89s
300 0.89s
1000 0.73s

Now I can't talk about any substituters other than HTTP/2 binary caches and I don't know how exactly this should be fixed (should the pool size be a new option (with a high default or low default?), should the thread pool be replaced with true asynchrony, should doing the same for actual substitution be investigated as well, should substitution querying and actual substitution be merged after all, ...), but obviously I would be very happy about any solution removing this bottleneck and would be willing to implement it when there is consensus on the design.

Upstream-Issue: https://git.lix.systems/NixOS/nix/issues/5118 TL;DR: a single-line change can make binary cache querying 50x as fast Continuing from #5109, I decided to investigate *why* substitution (querying) is/seems to be fairly slow. Okay, maybe my 800 derivations are simply a bit much, I thought, but then I found @edolstra's [commit message](https://github.com/kha/nix/commit/90ad02bf626b885a5dd8967894e2eafc953bdf92) stating that HTTP/2 should handle this in under a second. So what's going on? The async API described in that commit never really got put to use as far as I can see, so it turns out that throughput is (*severely*) limited by the size of the [thread pool](https://github.com/kha/nix/blob/a6ba313a0aac3b6e2fef434cb42d190a0849238e/src/libstore/misc.cc#L103) used to fetch path infos in parallel via the synchronous API. The standard size of that pool is the number of hardware threads (8 on this machine), but we can improve throughput by >50x by massively scaling up that pool, at least in my use case! #threads | "querying missing paths" --- | --- 8 | 38s 32 | 10s 100 | 1.89s 300 | 0.89s 1000 | 0.73s Now I can't talk about any substituters other than HTTP/2 binary caches and I don't know how exactly this should be fixed (should the pool size be a new option (with a high default or low default?), should the thread pool be replaced with true asynchrony, should doing the same for actual substitution be investigated as well, should substitution querying and actual substitution be merged after all, ...), but obviously I would be very happy about *any* solution removing this bottleneck and would be willing to implement it when there is consensus on the design.
lix-bot added the
imported
label 2024-03-16 06:45:07 +00:00
Owner

...This feels like it shouldn't be solved by just runinng 1000 threads, but rather a complete redesign of the infrastructure that does this. We should easily be able to hit this many requests without this kind of hack.

...This feels like it shouldn't be solved by just runinng 1000 threads, but rather a _complete_ redesign of the infrastructure that does this. We should easily be able to hit this many requests without this kind of hack.
jade added the
E/requires rearchitecture
label 2024-03-18 16:26:25 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#139
No description provided.