Remote store leads to lots of "copying 0 paths" messages #749
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#749
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Describe the bug
I am trying to use a remote store with ssh-ng. When my colleagues who use the latest version NixCpp initiate a build with
--store ssh-ng://builder --eval-store auto
they have the build start successfully.Meanwhile when I try to do so with Lix, I get a bunch of
copying 0 paths
messages and the build never seems to start.It might have something to do with IFD since we use haskell.nix.
Steps To Reproduce
I'm struggling to extract a reproducer.
I am experiencing this when doing
nix-build --store ssh-ng://builder --eval-store auto
using my work CI. I think this might be related to the behavior for pushing inputs?I do not get the bad behaviour if I run:
Expected behavior
The remote store build should trigger relatively quickly.
nix --version
outputAdditional context
Here is a snippet of the verbose log:
Log
This seems to be related to "daemon worker op 44" which is
AddMultipleToStore
can you retry with lix main? 2.92 has some locking problems (see eg #745) that have been fixed in main, it's very possible that your configuration makes them more likely to appear
I tried it out with the latest
main
@pennae and it seems to be a bit quicker but it still spends ages doing thiscopying 0 paths
stuff.I thought I would compare the verbose output without the remote store stuff and the corresponding lines are:
Log without remote store
hmm. can you share test cases that reproduce this?
Here you go: https://git.lix.systems/teofilc/T749-repro
I took the default haskell.nix template and added a dependency on
Agda
which has a bunch of dependencies. I couldn't come up with anything smaller sorry, since I think this is closely related to haskell.nix!If you use the substituters from the flake it shouldn't be horrible to test.
If you try to run a remote store build, you should see tons of
copying 0 paths
in the log. I've run this againstssh-ng://localhost
and I can see that but they get resolved really quickly because we don't have to pay the connection overhead. If you use it against an actual remote store then it gets really slow.This has also pointed out a bit of the haskell.nix code which I think is causing the bad behaviour:
4cb78ba55c/lib/load-cabal-plan.nix (L98-L102)
. Maybe this code just can't be efficient with remote stores...Ok with this change (to haskell.nix) this is basically resolved:
So the issue was that
haskell.nix
produced a lot of these strings whose context included the .drv of the IFD not the outpath of the IFD. So we would rebuild for each string, and we have hundreds of those. This is mostly fine with a local store, but the network latency really slowed it down with a remote store.Thanks for your help @pennae !
i love it when bugs are this easy :D
your analysis is correct, certain interactions with remote stores simply aren't very efficient today. in this case we could stop copying early if we detect that nothing needs to be copied, but at that point we've already eaten the bulk of the cost (which will be querying the remote which paths it doesn't have yet).