Binary cache closure querying

lheckemann commented

2024-12-20 13:39:36 +00:00

Member

This is an RFC/invitation for discussion, I don't have a strongly preferred solution at this point.

Querying a large closure on a binary cache can take a while, since one request is performed for each path that's not already present locally. The lower limit on the time taken for this with the current binary cache protocol and querying behaviour is request latency * graph depth.

Describe the solution you'd like

We can extend the binary cache protocol with an optional way to get the narinfos for a path's whole closure in one go.

Describe alternatives you've considered

Since we're aware of all of a path's potential dependencies in advance, we could query the narinfos for all the output paths in its closure before we know if the output we want actually depends on them. This results in far more requests than my proposed approach, but could provide similar performance improvements.
A new, incompatible binary cache protocol that's generally smarter (e.g. allowing querying multiple paths in a single request, or even providing paths that are already available locally so we don't get any redundant info). This might be worth doing at some point anyway, but for now I think a "quick-fix" to what we already have can provide great value.

Additional context

I've implemented a hacky prototype version of what I suggest.

This extends binary caches with optional <hash>.closure.narinfo files, which contain arbitrary narinfos separated by blank lines. Lix will fetch these and store all the obtained narinfos in the narinfo cache. This allows short-circuiting requests for any dependencies that were provided eagerly by the server.

https://gerrit.lix.systems/c/lix/+/2324 -- the client side
https://git.lix.systems/lheckemann/closure-cache/src/branch/main/closure-cache.ts -- the server side (everything hardcoded to work for my own deployment). When <hash>.closure.narinfo is requested, it will read the narinfos for the whole dependency closure and serve them. While I've implemented it as a dynamic server, these closure files could just as well be precomputed and served as static files.

This improves querying performance on a chunky closure significantly:

It also preserves both backwards and forwards compatibility: the client can still talk to binary caches that don't support this feature, and binary caches that do support it can still be used by clients that don't.

Thoughts:

Should support for this protocol be indicated as part of nix-cache-info? We can halve the number of requests made to caches that don't support it if we do so.
Does accepting narinfos for any requested path have security implications I haven't thought of? All I can think of is that this makes it easier for a malicious cache to serve a different version of a dependency depending on which dependent a client asked for, but signatures are still checked as usual.
Does it make sense not to specify which paths should actually be included? I called it "closure", but the server could really serve whatever narinfos it wants, e.g. putting a limit on the number of narinfos served at once or doing fancy heuristics based on access patterns to "guess" which paths clients are likely to already have.
Does this introduce significant new consistency concerns? I could definitely see things getting weird if paths within a fully static-file-based binary cache are ever modified, but I'm not sure this is much worse than the status quo.
Would we want to take this opportunity to replace the custom narinfo format (for this use case) with a better-specified one where broad agreement on how to parse it already exists, like JSON or one of the more efficient binary encodings of JSON's shape?

_This is an RFC/invitation for discussion, I don't have a strongly preferred solution at this point._ ## Is your feature request related to a problem? Please describe. Querying a large closure on a binary cache can take a while, since one request is performed for each path that's not already present locally. The lower limit on the time taken for this with the current binary cache protocol and querying behaviour is `request latency * graph depth`. ## Describe the solution you'd like We can extend the binary cache protocol with an optional way to get the narinfos for a path's whole closure in one go. ## Describe alternatives you've considered - Since we're aware of all of a path's _potential_ dependencies in advance, we could query the narinfos for all the output paths in its closure before we know if the output we want actually depends on them. This results in far more requests than my proposed approach, but could provide similar performance improvements. - A new, incompatible binary cache protocol that's generally smarter (e.g. allowing querying multiple paths in a single request, or even providing paths that are already available locally so we don't get any redundant info). This might be worth doing at some point anyway, but for now I think a "quick-fix" to what we already have can provide great value. ## Additional context I've implemented a hacky prototype version of what I suggest. This extends binary caches with optional `<hash>.closure.narinfo` files, which contain arbitrary narinfos separated by blank lines. Lix will fetch these and store all the obtained narinfos in the narinfo cache. This allows short-circuiting requests for any dependencies that were provided eagerly by the server. https://gerrit.lix.systems/c/lix/+/2324 -- the client side https://git.lix.systems/lheckemann/closure-cache/src/branch/main/closure-cache.ts -- the server side (everything hardcoded to work for my own deployment). When `<hash>.closure.narinfo` is requested, it will read the narinfos for the whole dependency closure and serve them. While I've implemented it as a dynamic server, these closure files could just as well be precomputed and served as static files. This improves querying performance on a chunky closure significantly: ![screenshot of a hyperfine run](attachments/20fd1f9b-4ca6-4f5a-a09d-8ce36100fd8f) It also preserves both backwards and forwards compatibility: the client can still talk to binary caches that don't support this feature, and binary caches that do support it can still be used by clients that don't. Thoughts: - Should support for this protocol be indicated as part of `nix-cache-info`? We can halve the number of requests made to caches that don't support it if we do so. - Does accepting narinfos for any requested path have security implications I haven't thought of? All I can think of is that this makes it easier for a malicious cache to serve a different version of a dependency depending on which dependent a client asked for, but signatures are still checked as usual. - Does it make sense not to specify _which_ paths should actually be included? I called it "closure", but the server could really serve whatever narinfos it wants, e.g. putting a limit on the number of narinfos served at once or doing fancy heuristics based on access patterns to "guess" which paths clients are likely to already have. - Does this introduce significant new consistency concerns? I could definitely see things getting weird if paths within a fully static-file-based binary cache are ever modified, but I'm not sure this is much worse than the status quo. - Would we want to take this opportunity to replace the custom narinfo format (for this use case) with a better-specified one where broad agreement on how to parse it already exists, like JSON or one of the more efficient binary encodings of JSON's shape?

2024-12-20-144201_screenshot.png

147 KiB

👀 1

pennae commented

2024-12-20 14:00:44 +00:00

Owner

Since we're aware of all of a path's potential dependencies in advance, we could query the narinfos for all the output paths in its closure before we know if the output we want actually depends on them. This results in far more requests than my proposed approach, but could provide similar performance improvements.

that's what we're currently working towards. we already have http2 support with all the pipelining goodness it supports, the limiting factor for using that properly is only that the legacy code we're working with is absolute spaghetti. once we've distangled this we can make much better use of the WantMassQuery bit to blast stores with queries for all path infos in a closure at once, without introducing more hacks that merely work around fundamental design deficiencies instead of actually fixing them.

> Since we're aware of all of a path's potential dependencies in advance, we could query the narinfos for all the output paths in its closure before we know if the output we want actually depends on them. This results in far more requests than my proposed approach, but could provide similar performance improvements. that's what we're currently working towards. we already have http2 support with all the pipelining goodness it supports, the limiting factor for using that properly is only that the legacy code we're working with is absolute spaghetti. once we've distangled this we can make *much better* use of the WantMassQuery bit to blast stores with queries for all path infos in a closure at once, *without* introducing more hacks that merely work around fundamental design deficiencies instead of actually *fixing* them.

lheckemann commented

2024-12-20 14:45:27 +00:00

Author

Member

That sounds good!

Does that completely obsolete the idea of having the server provide data more eagerly though? Making fewer requests seems desirable even if we're rid of the graph-depth-roundtrips problem (though that's purely vibes-based and I might be completely off there), and the server is in a better position to know what's necessary (because it has the dependency metadata at hand) than the client.

That sounds good! Does that completely obsolete the idea of having the server provide data more eagerly though? Making fewer requests seems desirable even if we're rid of the graph-depth-roundtrips problem (though that's purely vibes-based and I might be completely off there), and the server is in a better position to know what's necessary (because it has the dependency metadata at hand) than the client.

pennae commented

2024-12-20 14:50:09 +00:00

Owner

it does not, but the current api (as you've noticed) does not give us any good methods to provide more data. we may want to do this at some point, but it needs a more thorough redesign of the binary cache protocols which right now are an unholy mix of half store, half http wrapper. providing extra data is only really useful in the non-http case, and we can improve those so much more with a proper extensible rpc protocol that doesn't have to pretend it's http under the hood somewhere

it does not, but the current api (as you've noticed) does not give us any good methods to provide more data. we may want to do this at some point, but it needs a more thorough redesign of the binary cache protocols which right now are an unholy mix of half store, half http wrapper. providing extra data is only really useful in the non-http case, and we can improve those *so much more* with a proper extensible rpc protocol that doesn't have to pretend it's http under the hood somewhere

lheckemann commented

2024-12-20 14:55:31 +00:00

Author

Member

providing extra data is only really useful in the non-http case

How so?

we can improve those so much more with a proper extensible rpc protocol that doesn't have to pretend it's http under the hood somewhere

I find the fact that the current binary cache protocol can be implemented by serving static files is pretty valuable. Is that something we don't care about preserving / improving on?

> providing extra data is only really useful in the non-http case How so? >we can improve those so much more with a proper extensible rpc protocol that doesn't have to pretend it's http under the hood somewhere I find the fact that the current binary cache protocol can be implemented by serving static files is pretty valuable. Is that something we don't care about preserving / improving on?

pennae commented

2024-12-20 16:26:00 +00:00

Owner

How so?

extra data like closure info is potentially very expensive to save as static files in the http case, or impossible to compute statically at all. dynamically generated data on http stores is mostly not realistic since our primary http store remotes are s3 buckets.

Is that something we don't care about preserving / improving on?

we absolutely do! but it's hard to add extensions to only one set of binary caches without making things much harder for ourselves in the future. the cache protocol/api is already very strange, attempting to mimic being a store (to the extent where it's possible to build directly into an http cache!). we'd be much better off if we dropped this masquerade and started treating caches as distinct entities, with distinct capabilities. once we have that it'd be much easier to add additional things on top without complicating maintenance of the existing infrastructure.

> How so? extra data like closure info is potentially very expensive to save as static files in the http case, or impossible to compute statically at all. dynamically generated data on http stores is mostly not realistic since our primary http store remotes are s3 buckets. > Is that something we don't care about preserving / improving on? we absolutely do! but it's hard to add extensions to only one set of binary caches without making things much harder for ourselves in the future. the cache protocol/api is already *very* strange, attempting to mimic being a store (to the extent where it's possible to build *directly into* an http cache!). we'd be much better off if we dropped this masquerade and started treating caches as distinct entities, with distinct capabilities. once we have that it'd be much easier to add additional things on top without complicating maintenance of the existing infrastructure.

lheckemann commented

2024-12-20 16:52:32 +00:00

Author

Member

extra data like closure info is potentially very expensive to save as static files in the http case

I'll be playing around with this a bit to get a proper feel for the costs, but I think the closure path being optional leaves a lot of space for optimising around expected access patterns (a simple approach being to provide them for top-level outputs of CI builds, but not for anything deeper in the graph) that wouldn't bring the explosion of space usage that having them for everything would.

or impossible to compute statically at all

How?

we absolutely do! but it's hard to add extensions to only one set of binary caches without making things much harder for ourselves in the future. the cache protocol/api is already very strange, attempting to mimic being a store

That seemed fairly sensible to me...

(to the extent where it's possible to build directly into an http cache!).

...wait wtf

we'd be much better off if we dropped this masquerade and started treating caches as distinct entities, with distinct capabilities. once we have that it'd be much easier to add additional things on top without complicating maintenance of the existing infrastructure.

Sounds reasonable. Any idea when that might happen? Because I'd kind of like to be able to make use of this "soon" and am wondering if it would be reasonable to get this (but with reasonable parsing code, not the abomination I linked) into Lix with the knowledge that it will be thrown out again later. I don't want to make any demands here though, I'm fine with looking into other approaches like implementing it as a plugin or as a proxy that exposes the "one-narinfo-at-a-time" API based on the closure-serving one. :)

>extra data like closure info is potentially very expensive to save as static files in the http case I'll be playing around with this a bit to get a proper feel for the costs, but I think the closure path being optional leaves a lot of space for optimising around expected access patterns (a simple approach being to provide them for top-level outputs of CI builds, but not for anything deeper in the graph) that wouldn't bring the explosion of space usage that having them for everything would. >or impossible to compute statically at all How? > we absolutely do! but it's hard to add extensions to only one set of binary caches without making things much harder for ourselves in the future. the cache protocol/api is already *very* strange, attempting to mimic being a store That seemed fairly sensible to me... > (to the extent where it's possible to build *directly into* an http cache!). ...wait wtf > we'd be much better off if we dropped this masquerade and started treating caches as distinct entities, with distinct capabilities. once we have that it'd be much easier to add additional things on top without complicating maintenance of the existing infrastructure. Sounds reasonable. Any idea when that might happen? Because I'd kind of like to be able to make use of this "soon" and am wondering if it would be reasonable to get this (but with reasonable parsing code, not the abomination I linked) into Lix with the knowledge that it will be thrown out again later. I don't want to make any demands here though, I'm fine with looking into other approaches like implementing it as a plugin or as a proxy that exposes the "one-narinfo-at-a-time" API based on the closure-serving one. :)

jade commented

2024-12-21 03:42:41 +00:00

Owner

Another thing that is possibly of interest here is that currently the majority of traffic to binary caches is misses. So some ideas like racing caches against each other rather than querying sequentially would help a lot but they might also bankrupt cachix and cache.nixos.org in s3 api fees. So we should probably be careful about more aggressively querying, especially if CppNix is to take our improvements wholesale as they have a habit of doing.

The cache backend for these huge caches really needs to be something better than just s3. edef has ideas and maybe even drafts but is the burned out creature she is.

Another thing that is possibly of interest here is that currently the majority of traffic to binary caches is misses. So some ideas like racing caches against each other rather than querying sequentially would help a lot but they might also bankrupt cachix and cache.nixos.org in s3 api fees. So we should probably be careful about more aggressively querying, especially if CppNix is to take our improvements wholesale as they have a habit of doing. The cache backend for these huge caches really needs to be something better than just s3. edef has ideas and maybe even drafts but is the burned out creature she is.

jade commented

2024-12-21 03:43:46 +00:00

Owner

Also re your strategy: have you considered http2 push? This actually sounds possibly like it could be http2 push, to be standards compliant.

Edit: okay so to elaborate: we have a narinfo cache. we could accept http2 server pushes of paths into our narinfo response cache for the current host if we haven't seen the narinfo yet. in effect this would allow the server to drive the entire closure of a path into lix all at once, with zero protocol or semantic changes, just a smarter http server.

Also re your strategy: have you considered http2 push? This actually sounds possibly like it could be http2 push, to be standards compliant. Edit: okay so to elaborate: we have a narinfo cache. we could accept http2 server pushes of paths into our narinfo response cache for the current host if we haven't seen the narinfo yet. in effect this would allow the server to drive the entire closure of a path into lix all at once, with *zero* protocol or semantic changes, just a smarter http server.

lheckemann commented

2024-12-23 09:52:23 +00:00

Author

Member

they might also bankrupt cachix and cache.nixos.org in s3 api fees

Would it really result in more queries for c.n.o? I thought it would be the same amount of queries (edit: since c.n.o is the highest-priority cache for most users), just over a shorter timespan, and I don't think that's relevant for costs? As for cachix, is this something you've heard from Domen or speculation? Because if it's the latter I think it would makes a lot of sense to ask him, especially because he would probably also be interested in making caches noticeably faster and (potentially) needing fewer requests.

http2 push

That does sound elegant, but as you say does require a smarter server, unlike the static closure narinfo approach, making it much more complex to deploy.

>they might also bankrupt cachix and cache.nixos.org in s3 api fees Would it really result in more queries for c.n.o? I thought it would be the same amount of queries (edit: since c.n.o is the highest-priority cache for most users), just over a shorter timespan, and I don't think that's relevant for costs? As for cachix, is this something you've heard from Domen or speculation? Because if it's the latter I think it would makes a lot of sense to ask him, especially because he would probably also be interested in making caches noticeably faster and (potentially) needing fewer requests. >http2 push That does sound elegant, but as you say does require a smarter server, unlike the static closure narinfo approach, making it much more complex to deploy.

jade commented

2024-12-26 07:52:59 +00:00

Owner

I am 100% certain that adding racing of binary caches would cause twice as many requests to be sent in instances where you have both caches configured and in which c.n.o. (e.g.) returns a positive answer before the second cache is queried at all. More aggressive optimistic querying of dependencies of derivations before said derivations themselves would definitely lead to a larger number of requests in many instances as well since there's requests in flight which might get their results discarded after other requests in flight complete.

As for whether sending more requests is a problem, that is a question to be asked of people's architectures and AWS bills. But this is the type of thing for which we do need to ask the people running the caches.

Sending requests faster is a different question and imo is okay especially if there's want-mass-query set.

Of course everyone has a vested interest in making this stuff faster, it sucks. I am relatively of the opinion that any binary cache operating at scale really should be smarter than an s3 bucket. Also, #578 is something that makes me reticent to do more complexity in the binary cache creation code of Lix: it's already full of things that are definitely not our responsibility yet doesn't do enough things as it is. This isn't to say we shouldn't just ship experiments, but s3 binary caches really suck for numerous reasons like the extreme impracticality of gc without a database and imo we shouldn't consider these the preferred type of cache anymore; serious implementations of caches should have some server side intelligence and we could consider fully-static caches to be a less preferred option.

I am 100% certain that adding racing of binary caches would cause twice as many requests to be sent in instances where you have both caches configured and in which c.n.o. (e.g.) returns a positive answer before the second cache is queried at all. More aggressive optimistic querying of dependencies of derivations before said derivations themselves would definitely lead to a larger number of requests in many instances as well since there's requests in flight which might get their results discarded after other requests in flight complete. As for whether sending more requests is a problem, *that* is a question to be asked of people's architectures and AWS bills. But this is the type of thing for which we do need to ask the people running the caches. Sending requests faster is a different question and imo is okay especially if there's want-mass-query set. Of course everyone has a vested interest in making this stuff faster, it sucks. I am relatively of the opinion that any binary cache operating at scale really should be smarter than an s3 bucket. Also, #578 is something that makes me reticent to do more complexity in the binary cache creation code of Lix: it's already full of things that are definitely not our responsibility yet doesn't do enough things as it is. This isn't to say we shouldn't just ship experiments, but s3 binary caches really suck for numerous reasons like the extreme impracticality of gc without a database and imo we shouldn't consider these the preferred type of cache anymore; serious implementations of caches should have some server side intelligence and we *could* consider fully-static caches to be a less preferred option.

👍 1

jade added the

Context

RFD

Area/store

labels

2025-01-07 17:24:20 +00:00

jade referenced this issue

2025-01-22 18:30:19 +00:00

Lix queries all paths in cache before trying to realise any of them #636

jade added the

performance

label

2025-02-08 03:19:03 +00:00

pennae commented

2025-07-01 09:26:49 +00:00

Owner

can we consider this done now that closure queries are not rtt-limited, only by how many concurrent requests the cache will accept?

lheckemann commented

2025-07-01 11:59:48 +00:00

Author

Member

now that closure queries are not rtt-limited

It doesn't look like that's the case to me. Full details follow...

I'm running nix path-info --store https://cache.sphalerite.tech -r $systemClosure after removing ~/.cache/nix/binary-cache-v6.sqlite*, and it's not any quicker than when I wrote this issue.

Version info:

nix (Lix, like Nix) 2.94.0-dev-pre20250630-6e7c081
System type: x86_64-linux
Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux
Features: gc, signed-caches
System configuration file: /etc/nix/nix.conf
User configuration files: /home/linus/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/linus/.local/share/flatpak/exports/etc/xdg/nix/nix.conf:/var/lib/flatpak/exports/etc/xdg/nix/nix.conf:/home/linus/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/linus/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/linus/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf
Store directory: /nix/store
State directory: /nix/var/nix
Data directory: /nix/store/h7girgp2z25hg70fqjblix6cnv60b8qw-lix-2.94.0-dev-pre20250630-6e7c081/share

I also tried adding -vv --log-format internal-json and if I'm interpreting the log correctly, it's never even fetching more than one narinfo concurrently, even though the cache has WantMassQuery: 1. Maybe I have the wrong expectations regarding how it's logged, or I'm analysing the output correctly, but here's what I'm doing:

activities.jq:

reduce .[] as $item (
  [{}];
  ($item.id | tostring) as $id |
  ($item.text // "") as $text |
  .[-1] as $prev |
  if $item.action == "start" and ($text | startswith("downloading")) then . + [$prev + {$id: $text}]
  elif $item.action == "stop" and ($prev | has($id)) then . + [$prev | del(.[$id])]
  else .
end)
| map(select(keys | length > 0))
| .[]
| [.[]]

Invoked with:

<nix-internal-json-log grep ^@nix | sed 's/^@nix //g' | jq --slurp -f activities.jq

The idea of this script is to produce a list of lists of concurrently running downloads; each of the lists only contains a single entry though. A few random manual samples of the log also suggest the same thing: each narinfo download doesn't start until the previous one has finished.

>now that closure queries are not rtt-limited It doesn't look like that's the case to me. Full details follow... I'm running `nix path-info --store https://cache.sphalerite.tech -r $systemClosure` after removing `~/.cache/nix/binary-cache-v6.sqlite*`, and it's not any quicker than when I wrote this issue. Version info: ``` nix (Lix, like Nix) 2.94.0-dev-pre20250630-6e7c081 System type: x86_64-linux Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux Features: gc, signed-caches System configuration file: /etc/nix/nix.conf User configuration files: /home/linus/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/linus/.local/share/flatpak/exports/etc/xdg/nix/nix.conf:/var/lib/flatpak/exports/etc/xdg/nix/nix.conf:/home/linus/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/linus/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/linus/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf Store directory: /nix/store State directory: /nix/var/nix Data directory: /nix/store/h7girgp2z25hg70fqjblix6cnv60b8qw-lix-2.94.0-dev-pre20250630-6e7c081/share ``` I also tried adding `-vv --log-format internal-json` and if I'm interpreting the log correctly, it's never even fetching more than one narinfo concurrently, even though the cache has `WantMassQuery: 1`. Maybe I have the wrong expectations regarding how it's logged, or I'm analysing the output correctly, but here's what I'm doing: `activities.jq`: ``` reduce .[] as $item ( [{}]; ($item.id | tostring) as $id | ($item.text // "") as $text | .[-1] as $prev | if $item.action == "start" and ($text | startswith("downloading")) then . + [$prev + {$id: $text}] elif $item.action == "stop" and ($prev | has($id)) then . + [$prev | del(.[$id])] else . end) | map(select(keys | length > 0)) | .[] | [.[]] ``` Invoked with: ``` <nix-internal-json-log grep ^@nix | sed 's/^@nix //g' | jq --slurp -f activities.jq ``` The idea of this script is to produce a list of lists of concurrently running downloads; each of the lists only contains a single entry though. A few random manual samples of the log also suggest the same thing: each narinfo download doesn't start until the previous one has finished.

pennae commented

2025-07-01 12:07:26 +00:00

Owner

hmm, then more work still needs to be done. queryMissing as done during build setup should be much better now, but of course there would be other things that are busted 🫠

hmm, then more work still needs to be done. `queryMissing` as done during build setup should be much better now, but of course there would be other things that are busted 🫠

👍 1

lheckemann commented

2025-07-01 12:49:54 +00:00

Author

Member

Can confirm that nix build --dry-run instead of nix path-info queries with much more concurrency, consistently fetching up to ~500 narinfos at once in a few tests and taking 1.5s rather than 2min. Great work, and I imagine this will affect real-world use much more than the performance of nix path-info :)

If I'm reading current queryMissing and related code correctly, it doesn't yet query build dependencies before knowing whether they're needed, so there's still rtt limitation there, right? (e.g. query hello, and only once we have a negative result there will we query for stdenv and the other paths needed to build hello). I don't think that's a big issue since the round trips are on the depth of the graph rather than the total number of paths, but it is something we discussed earlier on in this issue :)

Can confirm that `nix build --dry-run` instead of `nix path-info` queries with much more concurrency, consistently fetching up to ~500 narinfos at once in a few tests and taking 1.5s rather than 2min. Great work, and I imagine this will affect real-world use much more than the performance of `nix path-info` :) If I'm reading current `queryMissing` and related code correctly, it doesn't yet query build dependencies before knowing whether they're needed, so there's still rtt limitation there, right? (e.g. query `hello`, and only once we have a negative result there will we query for `stdenv` and the other paths needed to build `hello`). I don't _think_ that's a big issue since the round trips are on the depth of the graph rather than the total number of paths, but it is something we discussed earlier on in this issue :)

pennae commented

2025-07-01 13:05:33 +00:00

Owner

yeah, there are still some bottlenecks in the process, but they're mostly related to search depth instead of being applied indiscriminately across the entire tree. we probably got most of the query performance out of it that we can without querying things we may not need to query (which comes with its own set of local overhead and cost to the cache being queried)

yeah, there are still *some* bottlenecks in the process, but they're mostly related to search depth instead of being applied indiscriminately across the entire tree. we probably got most of the query performance out of it that we can without querying things we may not need to query (which comes with its own set of local overhead and cost to the cache being queried)

👍 1

lheckemann commented

2025-07-01 14:54:49 +00:00

Author

Member

I also did a more apples-to-apples comparison (nix build --dry-run before and after 877b0d7121) and got less spectacular but still very satisfying results:

Measurement	Before	After
max concurrent requests	25	500
run time	5s	1.5s

I also did a more apples-to-apples comparison (`nix build --dry-run` before and after 877b0d7121a5d4d20114ba8d5981d99afe27e94e) and got less spectacular but still very satisfying results: | Measurement | Before | After | |---------|---------|---------| | max concurrent requests | 25 | 500 | | run time | 5s | 1.5s |

Rows
Columns

Binary cache closure querying #605

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context