Unnecesary(?) double-canonicalisePathMetaData in LocalDerivationGoal::registerOutputs #786

Open
opened 2025-04-03 07:56:50 +00:00 by jjjollyjim · 4 comments

Describe the bug

LocalDerivationGoal::registerOutputs calls canonicalisePathMetaData twice after building normal(tm) derivations, wasting time on directory traversals/metadata gathering.

Steps To Reproduce

  1. grab this script:
    script.bt
    struct stdstring {
            char *ptr;
            uint64_t len;
            uint64_t pad1;
            uint64_t pad2;
    }
    
    uprobe:@LIX@/lib/liblixstore.so:"_ZN3nix24canonicalisePathMetaDataERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalISt4pairIjjEERSt3setIS9_ImmESt4lessISD_ESaISD_EE" { 
            @start[tid] = nsecs(monotonic);
            $s = (struct stdstring*)arg0;
            @path[tid] = buf($s->ptr, $s->len);
    }       
    
    uretprobe:@LIX@/lib/liblixstore.so:"_ZN3nix24canonicalisePathMetaDataERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalISt4pairIjjEERSt3setIS9_ImmESt4lessISD_ESaISD_EE" { 
            printf("canonicalisePathMetaData of %r took %lldus\n", @path[tid], (nsecs(monotonic) - @start[tid])/1000);
    }
    
  2. substitute @LIX@ with your /nix/store/xxx-lix2.93-xxx
  3. bpftrace script.bt
  4. Build some derivations
  5. Observe the duplicate calls:
    canonicalisePathMetaData of /nix/store/cm14q864j1ldyk368d86z5ah42dfz1g6-system-path.drv.chroot/nix/store/bcd6ngs79jmfhcl1vsiamcff256d2c73-system-path took 397611us
    canonicalisePathMetaData of /nix/store/cm14q864j1ldyk368d86z5ah42dfz1g6-system-path.drv.chroot/nix/store/bcd6ngs79jmfhcl1vsiamcff256d2c73-system-path took 176295us
    canonicalisePathMetaData of /nix/store/10r5rpayyqdsc10jicpnx1vbgmyvlqv6-dbus-1.drv.chroot/nix/store/wl5hlagjvv236zlrkklrb7zah8szjgy1-dbus-1 took 64us
    canonicalisePathMetaData of /nix/store/10r5rpayyqdsc10jicpnx1vbgmyvlqv6-dbus-1.drv.chroot/nix/store/wl5hlagjvv236zlrkklrb7zah8szjgy1-dbus-1 took 20us
    

Expected behavior

Nix performs the minimum necessary directory traversals (and especially listxattr operations, which I believe tend to take out an FS-wide lock in the kernel, so could not be parallelised, even if other traversal tasks were).

nix --version output

lix (Lix, like Nix) 2.93.0-devpre20250403-0b37027 <-- this is HEAD (5a7e9e1746) but with -fno-omit-frame-pointers added in package.nix
System type: x86_64-linux
Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux, x86_64-v4-linux
Features: gc, signed-caches
System configuration file: /etc/nix/nix.conf
User configuration files: /home/jamie/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/jamie/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/jamie/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/jamie/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf
Store directory: /nix/store
State directory: /nix/var/nix
Data directory: /nix/store/zz8dqizy2c6k6bxwrar5vbs9s55lzwkl-lix-2.93.0-devpre20250403-0b37027/share

Additional context

There are three call sites to canonicalisePathMetaData in registerOutputs:

  • An initial one
  • One after rewriting hashes (which is not done for the normal packages I am building), which has a TODO comment suggesting that the hash-writing function should be updated to set permissions correctly so the call is unnecessary
  • One later on, which duplicates the same TODO comment, despite happening regardless of rewrites

As a certified rust girlie I am somewhat overwhelmed by C++ control flow inside registerOutputs (and don't fully understand which type of CA derivation the comments are discussing at a given moment), but my guess is that the second or third call site is redundant, and that the third could be made conditional on whether any rewriting/other activities(?) have occurred beforehand.

Motivation

I am interested in the overall performance of adding a new package to my environment.systemPackages (currently it is slow enough that I instead reach for nix-shell -p, and then don't have the package next time I want it). This involves the following builds:

image

The breakdown of this process is, approximately:

  • 7s in nix-instantiate
  • 7s in nix-build
    • 3.1s on the critical path of actual builder processes from execve to exit
      • 1.9s in system-path's builder, which suggests I should RIIR pkgs/build-support/buildenv/builder.pl...
    • 3s on the critical path between ::buildDone and ::done, which fixing this issue would shave ~180ms off
    • 800ms on the critical path between ::inputsRealised and ::runChild
    • 300ms on the critical path betwen ::runChild and execve (sandbox setup)
## Describe the bug `LocalDerivationGoal::registerOutputs` calls `canonicalisePathMetaData` twice after building normal(tm) derivations, wasting time on directory traversals/metadata gathering. ## Steps To Reproduce <ol> <li> grab this script: <details> <summary>script.bt</summary> ```bpftrace struct stdstring { char *ptr; uint64_t len; uint64_t pad1; uint64_t pad2; } uprobe:@LIX@/lib/liblixstore.so:"_ZN3nix24canonicalisePathMetaDataERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalISt4pairIjjEERSt3setIS9_ImmESt4lessISD_ESaISD_EE" { @start[tid] = nsecs(monotonic); $s = (struct stdstring*)arg0; @path[tid] = buf($s->ptr, $s->len); } uretprobe:@LIX@/lib/liblixstore.so:"_ZN3nix24canonicalisePathMetaDataERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8optionalISt4pairIjjEERSt3setIS9_ImmESt4lessISD_ESaISD_EE" { printf("canonicalisePathMetaData of %r took %lldus\n", @path[tid], (nsecs(monotonic) - @start[tid])/1000); } ``` </details></li> <li> substitute @LIX@ with your <code>/nix/store/xxx-lix2.93-xxx</code></li> <li> <code>bpftrace script.bt</code></li> <li> Build some derivations</li> <li> Observe the duplicate calls: ``` canonicalisePathMetaData of /nix/store/cm14q864j1ldyk368d86z5ah42dfz1g6-system-path.drv.chroot/nix/store/bcd6ngs79jmfhcl1vsiamcff256d2c73-system-path took 397611us canonicalisePathMetaData of /nix/store/cm14q864j1ldyk368d86z5ah42dfz1g6-system-path.drv.chroot/nix/store/bcd6ngs79jmfhcl1vsiamcff256d2c73-system-path took 176295us canonicalisePathMetaData of /nix/store/10r5rpayyqdsc10jicpnx1vbgmyvlqv6-dbus-1.drv.chroot/nix/store/wl5hlagjvv236zlrkklrb7zah8szjgy1-dbus-1 took 64us canonicalisePathMetaData of /nix/store/10r5rpayyqdsc10jicpnx1vbgmyvlqv6-dbus-1.drv.chroot/nix/store/wl5hlagjvv236zlrkklrb7zah8szjgy1-dbus-1 took 20us ``` </li> </ol> ## Expected behavior Nix performs the minimum necessary directory traversals (and especially listxattr operations, which I believe tend to take out an FS-wide lock in the kernel, so could not be parallelised, even if other traversal tasks were). ## `nix --version` output ``` lix (Lix, like Nix) 2.93.0-devpre20250403-0b37027 <-- this is HEAD (5a7e9e1746) but with -fno-omit-frame-pointers added in package.nix System type: x86_64-linux Additional system types: i686-linux, x86_64-v1-linux, x86_64-v2-linux, x86_64-v3-linux, x86_64-v4-linux Features: gc, signed-caches System configuration file: /etc/nix/nix.conf User configuration files: /home/jamie/.config/nix/nix.conf:/etc/xdg/nix/nix.conf:/home/jamie/.nix-profile/etc/xdg/nix/nix.conf:/nix/profile/etc/xdg/nix/nix.conf:/home/jamie/.local/state/nix/profile/etc/xdg/nix/nix.conf:/etc/profiles/per-user/jamie/etc/xdg/nix/nix.conf:/nix/var/nix/profiles/default/etc/xdg/nix/nix.conf:/run/current-system/sw/etc/xdg/nix/nix.conf Store directory: /nix/store State directory: /nix/var/nix Data directory: /nix/store/zz8dqizy2c6k6bxwrar5vbs9s55lzwkl-lix-2.93.0-devpre20250403-0b37027/share ``` ## Additional context There are three call sites to `canonicalisePathMetaData` in `registerOutputs`: * An initial one * One after rewriting hashes (which is not done for the normal packages I am building), which has a TODO comment suggesting that the hash-writing function should be updated to set permissions correctly so the call is unnecessary * One later on, which duplicates the same TODO comment, despite happening regardless of rewrites As a certified rust girlie I am somewhat overwhelmed by C++ control flow inside registerOutputs (and don't fully understand which type of CA derivation the comments are discussing at a given moment), but my guess is that the second or third call site is redundant, and that the third could be made conditional on whether any rewriting/other activities(?) have occurred beforehand. ## Motivation I am interested in the overall performance of adding a new package to my `environment.systemPackages` (currently it is slow enough that I instead reach for `nix-shell -p`, and then don't have the package next time I want it). This involves the following builds: ![image](/attachments/647cf10f-2485-46ac-9cf6-67408731fc76) The breakdown of this process is, approximately: * 7s in `nix-instantiate` * 7s in `nix-build` * * 3.1s on the critical path of actual builder processes from `execve` to `exit` * * * 1.9s in system-path's builder, which suggests I should RIIR `pkgs/build-support/buildenv/builder.pl`... * * 3s on the critical path between `::buildDone` and `::done`, which fixing this issue would shave ~180ms off * * 800ms on the critical path between `::inputsRealised` and `::runChild` * * 300ms on the critical path betwen `::runChild` and `execve` (sandbox setup)
148 KiB
sky1e self-assigned this 2025-04-08 05:38:05 +00:00
Member

I'm giving this issue an attempt. I'll hopefully post a Gerrit CL once I have something that passes tests.

I'm giving this issue an attempt. I'll hopefully post a Gerrit CL once I have something that passes tests.
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/3045 ("libstore: Remove duplicate calls to canonicalizePathMetaData in LocalDerivationGoal::registerOutputs()")
  • comment in cl/3045 ("libstore: Remove duplicate calls to canonicalizePathMetaData in LocalDerivationGoal::registerOutputs()")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/3045", "number": 3045, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/3045", "number": 3045, "kind": "comment"}], "cl_meta": {"3045": {"change_title": "libstore: Remove duplicate calls to canonicalizePathMetaData in LocalDerivationGoal::registerOutputs()"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/3045](https://gerrit.lix.systems/c/lix/+/3045) ("libstore: Remove duplicate calls to canonicalizePathMetaData in LocalDerivationGoal::registerOutputs()") * comment in [cl/3045](https://gerrit.lix.systems/c/lix/+/3045) ("libstore: Remove duplicate calls to canonicalizePathMetaData in LocalDerivationGoal::registerOutputs()")
sky1e removed their assignment 2025-04-25 00:03:48 +00:00
Member

I made an attempt, under the approach of canonicalize file metadata once and then make all operations afterwards maintain that state by setting the metadata on the files they touch, at cl/3045. It's basic approach was rejected as a bad idea because the CA derivation system is far too fragile, so the final canonicalize call should stay unless CA derivations get unfragiled or ripped out entirely. I agree with the comment before the first call

 	        /* Canonicalise first.  This ensures that the path we're
	           rewriting doesn't contain a hard link to /etc/shadow or
	           something like that. */
 	        canonicalisePathMetaData(

that it really is necessary. If the first call is necessary and the last is a bad idea to remove, and this issue is about it being called twice, that doesn't really seem to leave other options. I wrote another far more limited change that tries to only skip the final canonicalization call for input addressed derivations where no rewriting needed to occur, so it doesn't need to touch anything else, but it's still fundamentally the same approach that got rejected. I've given up on this and removed myself from this issue.

I made an attempt, under the approach of canonicalize file metadata once and then make all operations afterwards maintain that state by setting the metadata on the files they touch, at cl/3045. It's basic approach was rejected as [a bad idea because the CA derivation system is far too fragile](https://gerrit.lix.systems/c/lix/+/3045/comments/b2e0a792_74d90dde), so the final canonicalize call should stay unless CA derivations get unfragiled or ripped out entirely. I agree with the [comment before the first call](https://git.lix.systems/lix-project/lix/src/commit/a133633ecc4ca3aed65899cb37c17601c95f5b2b/lix/libstore/build/local-derivation-goal.cc#L2080) ```c++ /* Canonicalise first. This ensures that the path we're rewriting doesn't contain a hard link to /etc/shadow or something like that. */ canonicalisePathMetaData( ``` that it really is necessary. If the first call is necessary and the last is a bad idea to remove, and this issue is about it being called twice, that doesn't really seem to leave other options. I wrote [another *far* more limited change](https://git.lix.systems/sky1e/lix/commit/9f3962e6c68daaceee1ca9edb1cff956ea0b2c11) that tries to only skip the final canonicalization call for input addressed derivations where no rewriting needed to occur, so it doesn't need to touch anything else, but it's still fundamentally the same approach that got rejected. I've given up on this and removed myself from this issue.
Owner

I wrote another far more limited change that tries to only skip the final canonicalization call for input addressed derivations where no rewriting needed to occur

that one is more plausibly sound, but we'd want more assurance that no rewrites had happened. the only reasonable way to do this is to produce a fresh, rewritten path when rewriting is done rather than rewriting in place, and canonicalizing the new path before deleting the rewrite input and moving the canonicalize path into place. doing this also somewhat requires refactoring registerOutputs to not contain most of its code in capturing lambdas because whoever wrote it couldn't be bothered to make it the least bit maintainable :(

> I wrote another far more limited change that tries to only skip the final canonicalization call for input addressed derivations where no rewriting needed to occur that one is more plausibly sound, but we'd want more assurance that no rewrites had happened. the only reasonable way to do this is to produce a fresh, rewritten path when rewriting *is* done rather than rewriting in place, and canonicalizing the *new* path before deleting the rewrite input and moving the canonicalize path into place. doing this also somewhat requires refactoring `registerOutputs` to not contain most of its code in capturing lambdas because whoever wrote it couldn't be bothered to make it the least bit maintainable :(
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#786
No description provided.