lsof in tests is never exercised #156

Closed
opened 2024-03-19 02:09:44 +00:00 by jade · 5 comments
Owner

Either this needs to be removed altogether or it needs to be fixed.

export _NIX_TEST_NO_LSOF=1

Either this needs to be removed altogether or it needs to be fixed. https://git.lix.systems/lix-project/lix/src/commit/b6bb869e52bc4990aeac09aa64025e66e02b780e/tests/functional/common/vars-and-functions.sh.in#L27
jade added the
testing
bug
labels 2024-03-19 02:09:44 +00:00
Owner
Brought to attention in https://gerrit.lix.systems/c/lix/+/580/comment/4f83a1c6_4e8cb552/
jade added the
E/help wanted
label 2024-03-25 04:51:47 +00:00
Member

lsof is a somewhat weird dependency. While it should work, macOS has several functions in libproc.h that may allow us to get this information directly more efficiently than lsof.

If the long testing time going above a reasonable timeout is the main issue then replacing the implementation may be a good idea. I can try writing this, though I probably want to move os-specific code out of gc.cc since the number of ifdefs would get annoying

lsof is a somewhat weird dependency. While it should work, macOS has several functions in [libproc.h](https://opensource.apple.com/source/xnu/xnu-2422.1.72/libsyscall/wrappers/libproc/libproc.h.auto.html) that may allow us to get this information directly more efficiently than lsof. If the long testing time going above a reasonable timeout is the main issue then replacing the implementation may be a good idea. I can try writing this, though I probably want to move os-specific code out of gc.cc since the number of ifdefs would get annoying
Member

A bit more info now that I figured out how to do stuff on a mac: upstream lsof is highly inefficient on macs.

On my 2012 MacBookPro9,2 with an i5-3210M running macOS 14.4.1 and very little happening:

  • /usr/sbin/lsof -n -w -F n >/dev/null takes 240ms
  • /run/current-system/sw/bin/lsof -n -w -F n >/dev/null takes 40 seconds

It's not entitlements (I unsigned the system lsof and it's still as fast). Checking in dtruss it seems to be that upstream lsof makes 50x as many proc_info syscalls as system lsof.

It looks like the reason nix-store --gc --print-roots is reasonably fast on macOS outside of testing is that it's using system lsof. During build configure.ac falls back to -DLSOF="lsof" if the lsof command isn't found, so nix runs whatever is in PATH.

The options for making the nix gc take a reasonable time are probably:

  • Use /usr/sbin/lsof instead of upstream lsof (I have tried and tests work)
  • Find some way of compiling system lsof from apple's OSS source
  • Don't bother and rewrite using libproc

I'm working on a rewrite using libproc but if people want a solution with fewer moving parts then using /usr/sbin/lsof would work.

A bit more info now that I figured out how to do stuff on a mac: upstream lsof is highly inefficient on macs. On my 2012 MacBookPro9,2 with an i5-3210M running macOS 14.4.1 and very little happening: * `/usr/sbin/lsof -n -w -F n >/dev/null` takes 240ms * `/run/current-system/sw/bin/lsof -n -w -F n >/dev/null` takes 40 seconds It's not entitlements (I unsigned the system lsof and it's still as fast). Checking in dtruss it seems to be that upstream lsof makes 50x as many proc_info syscalls as system lsof. It looks like the reason `nix-store --gc --print-roots` is reasonably fast on macOS outside of testing is that it's using system lsof. During build [configure.ac](https://git.lix.systems/lix-project/lix/src/commit/5a54b0a20c80356de5098694353f506e73fb883f/configure.ac#L128) falls back to `-DLSOF="lsof"` if the lsof command isn't found, so nix runs whatever is in PATH. The options for making the nix gc take a reasonable time are probably: * Use `/usr/sbin/lsof` instead of upstream lsof (I have tried and tests work) * Find some way of compiling system lsof from apple's [OSS source](https://github.com/apple-oss-distributions/lsof) * Don't bother and rewrite using libproc I'm working on a rewrite using libproc but if people want a solution with fewer moving parts then using `/usr/sbin/lsof` would work.
Member

I discovered why upstream lsof is so slow: There's an undocumented API that's been in XNU since OS X 10.10 that allows you to ask for only regions (like what you'd find in /proc/pid/maps on linux) backed by a file. Upstream lsof doesn't use it, so has to go through every region, which means many thousands of additional syscalls.

See my comment in a WIP commit: e6c0972318/src/libstore/gc.cc (L479)

I discovered why upstream lsof is so slow: There's an undocumented API that's been in XNU since OS X 10.10 that allows you to ask for only regions (like what you'd find in /proc/pid/maps on linux) backed by a file. Upstream lsof doesn't use it, so has to go through every region, which means many thousands of additional syscalls. See my comment in a WIP commit: https://git.lix.systems/artemist/lix/src/commit/e6c09723182762d72ea1d67e7a20a128afbcb95b/src/libstore/gc.cc#L479
artemist was assigned by jade 2024-04-06 04:44:11 +00:00
Author
Owner
https://gerrit.lix.systems/c/lix/+/723
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#156
No description provided.