Remote store leads to lots of "copying 0 paths" messages #749

Closed
opened 2025-03-19 10:53:35 +00:00 by teofilc · 6 comments

Describe the bug

I am trying to use a remote store with ssh-ng. When my colleagues who use the latest version NixCpp initiate a build with --store ssh-ng://builder --eval-store auto they have the build start successfully.

Meanwhile when I try to do so with Lix, I get a bunch of copying 0 paths messages and the build never seems to start.

It might have something to do with IFD since we use haskell.nix.

Steps To Reproduce

I'm struggling to extract a reproducer.

I am experiencing this when doing nix-build --store ssh-ng://builder --eval-store auto using my work CI. I think this might be related to the behavior for pushing inputs?

I do not get the bad behaviour if I run:

❯ nix-build -E '(import <nixpkgs> {}).hello.overrideAttrs (_: {name = "foo";}) ' --store ssh-ng://builder --eval-store auto -vvv

Expected behavior

The remote store build should trigger relatively quickly.

nix --version output

nix (Lix, like Nix) 2.92.0
System type: x86_64-linux
Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux, x86_64-v4-linux
Features: gc, signed-caches
System configuration file: /etc/nix/nix.conf
User configuration files: /home/teo/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/teo/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/teo/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/teo/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf:/nix/store/ys1zqxjyiczkny9bfi1f7i6clm1z39cd-gnome-settings-daemon-47.2/etc/xdg/nix/nix.conf
Store directory: /nix/store
State directory: /nix/var/nix
Data directory: /nix/store/vyfqgaip2hbylhhm1lci1zfvxv6a5g28-lix-2.92.0/share

Additional context

Here is a snippet of the verbose log:

Log
33778   │ copying 0 paths...
33779   │ starting pool of 15 threads
33780   │ performing daemon worker op: 31
33781   │ copying 0 paths...
33782   │ performing daemon worker op: 44
33783   │ copying 0 paths...
33784   │ starting pool of 15 threads
33785   │ performing daemon worker op: 31
33786   │ copying 0 paths...
33787   │ performing daemon worker op: 44
33788   │ evaluating file '/nix/store/qvvp19qxs6vfjkxra1g6899k5aszp5kg-haskell-project-plan-to-nix-pkgs/cabal-files/aeson.nix'
33789   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package.nix'
33790   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/component-options.nix'
33791   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package-options.nix'
33792   │ copying 0 paths...
33793   │ starting pool of 15 threads
33794   │ performing daemon worker op: 31
33795   │ copying 0 paths...
33796   │ performing daemon worker op: 44
33797   │ copying 0 paths...
33798   │ starting pool of 15 threads
33799   │ performing daemon worker op: 31
33800   │ copying 0 paths...
33801   │ performing daemon worker op: 44
33802   │ evaluating file '/nix/store/qvvp19qxs6vfjkxra1g6899k5aszp5kg-haskell-project-plan-to-nix-pkgs/cabal-files/attoparsec.nix'
33803   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package.nix'
33804   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/component-options.nix'
33805   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package-options.nix'

This seems to be related to "daemon worker op 44" which is AddMultipleToStore

## Describe the bug I am trying to use a remote store with ssh-ng. When my colleagues who use the latest version NixCpp initiate a build with `--store ssh-ng://builder --eval-store auto` they have the build start successfully. Meanwhile when I try to do so with Lix, I get a bunch of `copying 0 paths` messages and the build never seems to start. It might have something to do with IFD since we use haskell.nix. ## Steps To Reproduce I'm struggling to extract a reproducer. I am experiencing this when doing `nix-build --store ssh-ng://builder --eval-store auto` using my work CI. I think this might be related to the behavior for pushing inputs? I do not get the bad behaviour if I run: ``` ❯ nix-build -E '(import <nixpkgs> {}).hello.overrideAttrs (_: {name = "foo";}) ' --store ssh-ng://builder --eval-store auto -vvv ``` ## Expected behavior The remote store build should trigger relatively quickly. ## `nix --version` output ``` nix (Lix, like Nix) 2.92.0 System type: x86_64-linux Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux, x86_64-v4-linux Features: gc, signed-caches System configuration file: /etc/nix/nix.conf User configuration files: /home/teo/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/teo/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/teo/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/teo/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf:/nix/store/ys1zqxjyiczkny9bfi1f7i6clm1z39cd-gnome-settings-daemon-47.2/etc/xdg/nix/nix.conf Store directory: /nix/store State directory: /nix/var/nix Data directory: /nix/store/vyfqgaip2hbylhhm1lci1zfvxv6a5g28-lix-2.92.0/share ``` ## Additional context Here is a snippet of the verbose log: <details> <summary>Log</summary> ``` 33778 │ copying 0 paths... 33779 │ starting pool of 15 threads 33780 │ performing daemon worker op: 31 33781 │ copying 0 paths... 33782 │ performing daemon worker op: 44 33783 │ copying 0 paths... 33784 │ starting pool of 15 threads 33785 │ performing daemon worker op: 31 33786 │ copying 0 paths... 33787 │ performing daemon worker op: 44 33788 │ evaluating file '/nix/store/qvvp19qxs6vfjkxra1g6899k5aszp5kg-haskell-project-plan-to-nix-pkgs/cabal-files/aeson.nix' 33789 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package.nix' 33790 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/component-options.nix' 33791 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package-options.nix' 33792 │ copying 0 paths... 33793 │ starting pool of 15 threads 33794 │ performing daemon worker op: 31 33795 │ copying 0 paths... 33796 │ performing daemon worker op: 44 33797 │ copying 0 paths... 33798 │ starting pool of 15 threads 33799 │ performing daemon worker op: 31 33800 │ copying 0 paths... 33801 │ performing daemon worker op: 44 33802 │ evaluating file '/nix/store/qvvp19qxs6vfjkxra1g6899k5aszp5kg-haskell-project-plan-to-nix-pkgs/cabal-files/attoparsec.nix' 33803 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package.nix' 33804 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/component-options.nix' 33805 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package-options.nix' ``` </details> This seems to be related to "daemon worker op 44" which is `AddMultipleToStore`
Owner

can you retry with lix main? 2.92 has some locking problems (see eg #745) that have been fixed in main, it's very possible that your configuration makes them more likely to appear

can you retry with lix main? 2.92 has some locking problems (see eg #745) that have been fixed in main, it's very possible that your configuration makes them more likely to appear
Author

I tried it out with the latest main @pennae and it seems to be a bit quicker but it still spends ages doing this copying 0 paths stuff.

I thought I would compare the verbose output without the remote store stuff and the corresponding lines are:

Log without remote store
32247   │ performing daemon worker op: 1
32248   │ performing daemon worker op: 9
32249   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: created
32250   │ entered goal loop
32251   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: init
32252   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: loading de
        │ rivation
32253   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: have deriv
        │ ation
32254   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: done
32255   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: goal destr
        │ oyed
32256   │ performing daemon worker op: 41
32257   │ evaluating file '/nix/store/6h1nnayv71xp5gxr89zy52441cmfrcfw-haskell-project-plan-to-nix-pkgs/.plan.nix/ch-contact-details.ni
        │ x'
32258   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package.nix'
32259   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/component-options.nix'
32260   │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package-options.nix'
32261   │ performing daemon worker op: 1
32262   │ performing daemon worker op: 9
32263   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: created
32264   │ entered goal loop
32265   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: init
32266   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: loading de
        │ rivation
32267   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: have deriv
        │ ation
32268   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: done
32269   │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: goal destr
        │ oyed
32270   │ performing daemon worker op: 41

I tried it out with the latest `main` @pennae and it seems to be a bit quicker but it still spends ages doing this `copying 0 paths` stuff. I thought I would compare the verbose output without the remote store stuff and the corresponding lines are: <details> <summary>Log without remote store</summary> ``` 32247 │ performing daemon worker op: 1 32248 │ performing daemon worker op: 9 32249 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: created 32250 │ entered goal loop 32251 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: init 32252 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: loading de │ rivation 32253 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: have deriv │ ation 32254 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: done 32255 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: goal destr │ oyed 32256 │ performing daemon worker op: 41 32257 │ evaluating file '/nix/store/6h1nnayv71xp5gxr89zy52441cmfrcfw-haskell-project-plan-to-nix-pkgs/.plan.nix/ch-contact-details.ni │ x' 32258 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package.nix' 32259 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/component-options.nix' 32260 │ evaluating file '/nix/store/v19kvj9ny8hm98dj8hw1rkih74y8yhgr-source/modules/package-options.nix' 32261 │ performing daemon worker op: 1 32262 │ performing daemon worker op: 9 32263 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: created 32264 │ entered goal loop 32265 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: init 32266 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: loading de │ rivation 32267 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: have deriv │ ation 32268 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: done 32269 │ building of '/nix/store/2apbm94pla0ls938lv6jvkzfpilsy23s-haskell-project-plan-to-nix-pkgs.drv^out' from .drv file: goal destr │ oyed 32270 │ performing daemon worker op: 41 ``` </details>
Owner

hmm. can you share test cases that reproduce this?

hmm. can you share test cases that reproduce this?
Author

Here you go: https://git.lix.systems/teofilc/T749-repro

I took the default haskell.nix template and added a dependency on Agda which has a bunch of dependencies. I couldn't come up with anything smaller sorry, since I think this is closely related to haskell.nix!

If you use the substituters from the flake it shouldn't be horrible to test.

If you try to run a remote store build, you should see tons of copying 0 paths in the log. I've run this against ssh-ng://localhost and I can see that but they get resolved really quickly because we don't have to pay the connection overhead. If you use it against an actual remote store then it gets really slow.

This has also pointed out a bit of the haskell.nix code which I think is causing the bad behaviour: 4cb78ba55c/lib/load-cabal-plan.nix (L98-L102) . Maybe this code just can't be efficient with remote stores...

Here you go: https://git.lix.systems/teofilc/T749-repro I took the default haskell.nix template and added a dependency on `Agda` which has a bunch of dependencies. I couldn't come up with anything smaller sorry, since I think this is closely related to haskell.nix! If you use the substituters from the flake it shouldn't be horrible to test. If you try to run a remote store build, you should see tons of `copying 0 paths` in the log. I've run this against `ssh-ng://localhost` and I can see that but they get resolved really quickly because we don't have to pay the connection overhead. If you use it against an actual remote store then it gets really slow. This has also pointed out a bit of the haskell.nix code which I think is causing the bad behaviour: https://github.com/input-output-hk/haskell.nix/blob/4cb78ba55cd1e3eaf7b5247794c2eca6ac96559b/lib/load-cabal-plan.nix#L98-L102 . Maybe this code just can't be efficient with remote stores...
Author

Ok with this change (to haskell.nix) this is basically resolved:

diff --git a/lib/load-cabal-plan.nix b/lib/load-cabal-plan.nix
index 6bf0b1bf..0b7b590a 100644
--- a/lib/load-cabal-plan.nix
+++ b/lib/load-cabal-plan.nix
@@ -75,7 +75,7 @@ let
           buildable = true;
         } // lookupDependencies hsPkgs.pkgsBuildBuild (components.setup.depends or []) (components.setup.exe-depends or []);
       };
-  nixFilesDir = callProjectResults.projectNix + callProjectResults.src.origSubDir or "";
+  nixFilesDir = callProjectResults.projectNix.outPath + callProjectResults.src.origSubDir or "";
 in {
   # This replaces the `plan-nix/default.nix`
   pkgs = (hackage: {

So the issue was that haskell.nix produced a lot of these strings whose context included the .drv of the IFD not the outpath of the IFD. So we would rebuild for each string, and we have hundreds of those. This is mostly fine with a local store, but the network latency really slowed it down with a remote store.

Thanks for your help @pennae !

Ok with this change (to haskell.nix) this is basically resolved: ``` diff --git a/lib/load-cabal-plan.nix b/lib/load-cabal-plan.nix index 6bf0b1bf..0b7b590a 100644 --- a/lib/load-cabal-plan.nix +++ b/lib/load-cabal-plan.nix @@ -75,7 +75,7 @@ let buildable = true; } // lookupDependencies hsPkgs.pkgsBuildBuild (components.setup.depends or []) (components.setup.exe-depends or []); }; - nixFilesDir = callProjectResults.projectNix + callProjectResults.src.origSubDir or ""; + nixFilesDir = callProjectResults.projectNix.outPath + callProjectResults.src.origSubDir or ""; in { # This replaces the `plan-nix/default.nix` pkgs = (hackage: { ``` So the issue was that `haskell.nix` produced a lot of these strings whose context included the .drv of the IFD not the outpath of the IFD. So we would rebuild for each string, and we have hundreds of those. This is mostly fine with a local store, but the network latency really slowed it down with a remote store. Thanks for your help @pennae !
Owner

i love it when bugs are this easy :D

your analysis is correct, certain interactions with remote stores simply aren't very efficient today. in this case we could stop copying early if we detect that nothing needs to be copied, but at that point we've already eaten the bulk of the cost (which will be querying the remote which paths it doesn't have yet).

i love it when bugs are this easy :D your analysis is correct, certain interactions with remote stores simply aren't very efficient today. in this case we could stop copying early if we detect that nothing needs to be copied, but at that point we've already eaten the bulk of the cost (which will be querying the remote which paths it doesn't have yet).
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#749
No description provided.