[Nix#5118] Substituter query parallelism is far too low #139

New issue

Open

opened 2024-03-16 06:45:07 +00:00 by lix-bot · 1 comment

lix-bot commented

2024-03-16 06:45:07 +00:00

Member

Upstream-Issue: NixOS/nix#5118

TL;DR: a single-line change can make binary cache querying 50x as fast

Continuing from #5109, I decided to investigate why substitution (querying) is/seems to be fairly slow. Okay, maybe my 800 derivations are simply a bit much, I thought, but then I found @edolstra's commit message stating that HTTP/2 should handle this in under a second. So what's going on? The async API described in that commit never really got put to use as far as I can see, so it turns out that throughput is (severely) limited by the size of the thread pool used to fetch path infos in parallel via the synchronous API. The standard size of that pool is the number of hardware threads (8 on this machine), but we can improve throughput by >50x by massively scaling up that pool, at least in my use case!

#threads	"querying missing paths"
8	38s
32	10s
100	1.89s
300	0.89s
1000	0.73s

Now I can't talk about any substituters other than HTTP/2 binary caches and I don't know how exactly this should be fixed (should the pool size be a new option (with a high default or low default?), should the thread pool be replaced with true asynchrony, should doing the same for actual substitution be investigated as well, should substitution querying and actual substitution be merged after all, ...), but obviously I would be very happy about any solution removing this bottleneck and would be willing to implement it when there is consensus on the design.

Upstream-Issue: https://git.lix.systems/NixOS/nix/issues/5118 TL;DR: a single-line change can make binary cache querying 50x as fast Continuing from #5109, I decided to investigate *why* substitution (querying) is/seems to be fairly slow. Okay, maybe my 800 derivations are simply a bit much, I thought, but then I found @edolstra's [commit message](https://github.com/kha/nix/commit/90ad02bf626b885a5dd8967894e2eafc953bdf92) stating that HTTP/2 should handle this in under a second. So what's going on? The async API described in that commit never really got put to use as far as I can see, so it turns out that throughput is (*severely*) limited by the size of the [thread pool](https://github.com/kha/nix/blob/a6ba313a0aac3b6e2fef434cb42d190a0849238e/src/libstore/misc.cc#L103) used to fetch path infos in parallel via the synchronous API. The standard size of that pool is the number of hardware threads (8 on this machine), but we can improve throughput by >50x by massively scaling up that pool, at least in my use case! #threads | "querying missing paths" --- | --- 8 | 38s 32 | 10s 100 | 1.89s 300 | 0.89s 1000 | 0.73s Now I can't talk about any substituters other than HTTP/2 binary caches and I don't know how exactly this should be fixed (should the pool size be a new option (with a high default or low default?), should the thread pool be replaced with true asynchrony, should doing the same for actual substitution be investigated as well, should substitution querying and actual substitution be merged after all, ...), but obviously I would be very happy about *any* solution removing this bottleneck and would be willing to implement it when there is consensus on the design.

lix-bot added the

imported

label 2024-03-16 06:45:07 +00:00

puck commented

2024-03-16 14:07:49 +00:00

Owner

...This feels like it shouldn't be solved by just runinng 1000 threads, but rather a complete redesign of the infrastructure that does this. We should easily be able to hit this many requests without this kind of hack.

...This feels like it shouldn't be solved by just runinng 1000 threads, but rather a _complete_ redesign of the infrastructure that does this. We should easily be able to hit this many requests without this kind of hack.