haunted gc test failure in ci on macOS #446

Closed
opened 2024-07-13 07:13:16 +00:00 by jade · 6 comments
Owner

idk what this is but i don't think it could have possibly been because of the change in question https://buildbot.lix.systems/#/buildrequests/128100

idk what this is but i don't think it could have possibly been because of the change in question https://buildbot.lix.systems/#/buildrequests/128100
Author
Owner

Copying the log into here for posterity:

lix> >>> MALLOC_PERTURB_=228 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 MESON_BUILD_ROOT=/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /nix/store/0hhzvkw889bsybhqxy12ky4jx6a95p2d-python3-3.11.9/bin/python3 /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/meson/run-test.py nix-collect-garbage-dry-run.sh
lix>  ✀
lix> stdout:
lix> clearing store...
lix> 7 store paths deleted, 0.00 MiB freed
lix> [FAIL]
lix> stderr:
lix> ++(common/vars-and-functions.sh:282) trap onError ERR
lix> +(init.sh:6) test -n /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run
lix> +(init.sh:7) test -d /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run
lix> +(init.sh:8) chmod -R u+w /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run
lix> +(init.sh:10) killDaemon
lix> +(common/vars-and-functions.sh:117) [[ '' == '' ]]
lix> +(common/vars-and-functions.sh:118) return
lix> +(init.sh:11) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run
lix> +(init.sh:13) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run
lix> +(init.sh:15) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store
lix> +(init.sh:16) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var
lix> +(init.sh:17) mkdir -p /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/log/nix/drvs
lix> +(init.sh:18) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix
lix> +(init.sh:19) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/etc
lix> +(init.sh:21) cat
lix> ++(init.sh:21) whoami
lix> +(init.sh:34) cat
lix> +(init.sh:41) nix-store --init
lix> +(init.sh:44) test -e /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix/db/db.sqlite
lix> +++(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:282) trap onError ERR
lix> ++(common.sh:8) [[ -n '' ]]
lix> +(nix-collect-garbage-dry-run.sh:3) clearStore
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:72) echo 'clearing store...'
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:73) chmod -R +w /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:74) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:75) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:76) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:77) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:78) clearProfiles
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:67) profiles=/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:68) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles
lix> +(nix-collect-garbage-dry-run.sh:24) testCollectGarbageDryRun
lix> +(nix-collect-garbage-dry-run.sh:9) clearProfiles
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:67) profiles=/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles
lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:68) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles
lix> +(nix-collect-garbage-dry-run.sh:12) nix-env -f ./user-envs.nix -i foo-1.0
lix> installing 'foo-1.0'
lix> this derivation will be built:
lix>   /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/zrcdkxb3mnpw9fk6bg02lxiy4br0sc4p-foo-1.0.drv
lix> building '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/zrcdkxb3mnpw9fk6bg02lxiy4br0sc4p-foo-1.0.drv'...
lix> building '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/vk8z9bdl68arf67q26gsgwwqyiwx0rs8-user-environment.drv'...
lix> +(nix-collect-garbage-dry-run.sh:13) nix-env -f ./user-envs.nix -e foo-1.0
lix> uninstalling 'foo-1.0'
lix> building '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/wm4xysxv3glrbfjdby5rr8acq3wa16bf-user-environment.drv'...
lix> +(nix-collect-garbage-dry-run.sh:16) nix-env --delete-generations old
lix> removing profile version 1
lix> ++(nix-collect-garbage-dry-run.sh:17) nix-store --gc --print-dead
lix> ++(nix-collect-garbage-dry-run.sh:17) wc -l
lix> finding garbage collector roots...
lix> removing stale link from '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix/gcroots/auto/3w2vzknwyb98y59g1dmn0g0439qd4mrr' to '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles/profile-1-link'
lix> determining live/dead paths...
lix> +(nix-collect-garbage-dry-run.sh:17) [[ 7 -eq 7 ]]
lix> +(nix-collect-garbage-dry-run.sh:19) nix-collect-garbage --dry-run
lix> finding garbage collector roots...
lix> determining live/dead paths...
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/dv2d47sk9xvlky8qr0k8zzgy4sfa900c-user-environment
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/mrvd4n2py920s9gmxakqwiqcqqxzzf5v-foo-1.0
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/vk8z9bdl68arf67q26gsgwwqyiwx0rs8-user-environment.drv
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/w9a27gkxm0cs6cdzh6d207w68i559fr8-user-envs.builder.sh
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/wm4xysxv3glrbfjdby5rr8acq3wa16bf-user-environment.drv
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/xg3p2bjk1ggwsamwr467mrafqk3b5wwn-env-manifest.nix
lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/zrcdkxb3mnpw9fk6bg02lxiy4br0sc4p-foo-1.0.drv
lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead
lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l
lix> finding garbage collector roots...
lix> error: Listing pid 87261 file descriptors: Undefined error: 0
lix> +(nix-collect-garbage-dry-run.sh:20) [[ 0 -eq 7 ]]

Oh. This was the foolish wrapper we found when we looked at the sources.

Copying the log into here for posterity: ``` lix> >>> MALLOC_PERTURB_=228 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 MESON_BUILD_ROOT=/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 /nix/store/0hhzvkw889bsybhqxy12ky4jx6a95p2d-python3-3.11.9/bin/python3 /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/meson/run-test.py nix-collect-garbage-dry-run.sh lix> ✀ lix> stdout: lix> clearing store... lix> 7 store paths deleted, 0.00 MiB freed lix> [FAIL] lix> stderr: lix> ++(common/vars-and-functions.sh:282) trap onError ERR lix> +(init.sh:6) test -n /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run lix> +(init.sh:7) test -d /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run lix> +(init.sh:8) chmod -R u+w /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run lix> +(init.sh:10) killDaemon lix> +(common/vars-and-functions.sh:117) [[ '' == '' ]] lix> +(common/vars-and-functions.sh:118) return lix> +(init.sh:11) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run lix> +(init.sh:13) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run lix> +(init.sh:15) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store lix> +(init.sh:16) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var lix> +(init.sh:17) mkdir -p /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/log/nix/drvs lix> +(init.sh:18) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix lix> +(init.sh:19) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/etc lix> +(init.sh:21) cat lix> ++(init.sh:21) whoami lix> +(init.sh:34) cat lix> +(init.sh:41) nix-store --init lix> +(init.sh:44) test -e /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix/db/db.sqlite lix> +++(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:282) trap onError ERR lix> ++(common.sh:8) [[ -n '' ]] lix> +(nix-collect-garbage-dry-run.sh:3) clearStore lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:72) echo 'clearing store...' lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:73) chmod -R +w /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:74) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:75) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:76) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:77) mkdir /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:78) clearProfiles lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:67) profiles=/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:68) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles lix> +(nix-collect-garbage-dry-run.sh:24) testCollectGarbageDryRun lix> +(nix-collect-garbage-dry-run.sh:9) clearProfiles lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:67) profiles=/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles lix> +(/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/source/build/tests/functional/common/vars-and-functions.sh:68) rm -rf /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles lix> +(nix-collect-garbage-dry-run.sh:12) nix-env -f ./user-envs.nix -i foo-1.0 lix> installing 'foo-1.0' lix> this derivation will be built: lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/zrcdkxb3mnpw9fk6bg02lxiy4br0sc4p-foo-1.0.drv lix> building '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/zrcdkxb3mnpw9fk6bg02lxiy4br0sc4p-foo-1.0.drv'... lix> building '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/vk8z9bdl68arf67q26gsgwwqyiwx0rs8-user-environment.drv'... lix> +(nix-collect-garbage-dry-run.sh:13) nix-env -f ./user-envs.nix -e foo-1.0 lix> uninstalling 'foo-1.0' lix> building '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/wm4xysxv3glrbfjdby5rr8acq3wa16bf-user-environment.drv'... lix> +(nix-collect-garbage-dry-run.sh:16) nix-env --delete-generations old lix> removing profile version 1 lix> ++(nix-collect-garbage-dry-run.sh:17) nix-store --gc --print-dead lix> ++(nix-collect-garbage-dry-run.sh:17) wc -l lix> finding garbage collector roots... lix> removing stale link from '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/var/nix/gcroots/auto/3w2vzknwyb98y59g1dmn0g0439qd4mrr' to '/private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/test-home/.local/state/nix/profiles/profile-1-link' lix> determining live/dead paths... lix> +(nix-collect-garbage-dry-run.sh:17) [[ 7 -eq 7 ]] lix> +(nix-collect-garbage-dry-run.sh:19) nix-collect-garbage --dry-run lix> finding garbage collector roots... lix> determining live/dead paths... lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/dv2d47sk9xvlky8qr0k8zzgy4sfa900c-user-environment lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/mrvd4n2py920s9gmxakqwiqcqqxzzf5v-foo-1.0 lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/vk8z9bdl68arf67q26gsgwwqyiwx0rs8-user-environment.drv lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/w9a27gkxm0cs6cdzh6d207w68i559fr8-user-envs.builder.sh lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/wm4xysxv3glrbfjdby5rr8acq3wa16bf-user-environment.drv lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/xg3p2bjk1ggwsamwr467mrafqk3b5wwn-env-manifest.nix lix> /private/tmp/nix-build-lix-2.91.0-devpre20240712_917c9bd.drv-0/nix-test/nix-collect-garbage-dry-run/store/zrcdkxb3mnpw9fk6bg02lxiy4br0sc4p-foo-1.0.drv lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l lix> finding garbage collector roots... lix> error: Listing pid 87261 file descriptors: Undefined error: 0 lix> +(nix-collect-garbage-dry-run.sh:20) [[ 0 -eq 7 ]] ``` Oh. This was the foolish wrapper we found when we looked at the sources.
Author
Owner

[from matrix]

proc_pidfdlist https://opensource.apple.com/source/xnu/xnu-6153.41.3/bsd/kern/proc_info.c.auto.html
userspace side: 4f43d4276f/libsyscall/wrappers/libproc/libproc.c (L101)

the part that makes no sense here is that it should not have errno zero unless....

what if you closed all the fds.


So this is now actionable, the fix to this haunting is to bypass their wrapper because it is broken and cannot distinguish there being no actual FDs from returning failure. We copy the like five lines of their wrapper except not broken, write a disappointed comment, and close this bug.

[from matrix] proc_pidfdlist https://opensource.apple.com/source/xnu/xnu-6153.41.3/bsd/kern/proc_info.c.auto.html userspace side: https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L101 the part that makes no sense here is that it should not have errno zero unless.... what if you closed all the fds. ---- So this is now actionable, the fix to this haunting is to bypass their wrapper because it is broken and cannot distinguish there being no actual FDs from returning failure. We copy the like five lines of their wrapper except not broken, write a disappointed comment, and close this bug.
Author
Owner

cc @nrabulinski who i think was going to test if this is in fact the bug?

cc @nrabulinski who i think was going to test if this is in fact the bug?
Member

cc @nrabulinski who i think was going to test if this is in fact the bug?

Err, right, sorry, I did try it out on one of my Macs and that is indeed what can happen so you were right - if we use the wrapper there’s no way to differentiate between empty fd list and an error! How fun!

> cc @nrabulinski who i think was going to test if this is in fact the bug? Err, right, sorry, I did try it out on one of my Macs and that is indeed what can happen so you were right - if we use the wrapper there’s no way to differentiate between empty fd list and an error! How fun!
Author
Owner

Filed FB14695751 as follows:

The API design of libproc's function proc_pidinfo(pid, PROC_PIDLISTFDS, ...) is broken. In particular, it is implemented via a system call wrapper as follows:

int
proc_pidinfo(int pid, int flavor, uint64_t arg, void *buffer, int buffersize)
{
	int retval;

	if ((retval = __proc_info(PROC_INFO_CALL_PIDINFO, pid, flavor, arg, buffer, buffersize)) == -1) {
		return 0;
	}

	return retval;
}

https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L100-L110

For PROC_PIDLISTFDS, the return value is the buffer size that has been used for struct proc_fdinfo entries. However, a process can have zero fds! In this case, the wrapper above is simply broken: it returns 0, not because the inner call was -1, but because the inner call was 0!

This means that code calling PROC_PIDLISTFDS will appear to spuriously fail if a process has closed all of its fds. The only way to distinguish such a non-failure condition from an actual failure is to inspect the errno for success, but this is not really sound since in general system APIs have no guarantee to have any particular value of errno on success.

Compile the samples with:

cc look-ma-no-fds.c -o look-ma-no-fds
cc fail.c -o fail

Then observe the following:

Working as normal:

~ » ./fail $$
rv 32 errno 0
fd 0 type=1
fd 1 type=1
fd 2 type=1
fd 10 type=1

But with a process without any fds:
~ » ./look-ma-no-fds &
[1] 12625
~ » ./fail 12625
rv 0 errno 0

Given that one is not *supposed* to read errno since in general errno has no guaranteed value if a function is successful, this is an API design bug.

In particular, users of this API have to either copy paste the broken wrapper and write a less broken wrapper, thus linking to internals of libproc (impolite solution) or rely on errno behaviour on the success case, which is, in general, a conventions violation and somewhat fragile.

It would be helpful if this design flaw were at least written down in the documentation, however libproc contains almost no documentation comments anywhere in the headers.

Downstream bug: https://git.lix.systems/lix-project/lix/issues/446

xcode-select version 2408
~ » cc --version
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: x86_64-apple-darwin23.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

look-ma-no-fds.c:

#include <stdlib.h>
#include <unistd.h>

int main(void) {
    for (int i = 0; i < 1000; ++i) {
        close(i);
    }
    sleep(10000);
}

fail.c:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/proc_info.h>
#include <sys/sysctl.h>
#include <libproc.h>

int main(int argc, char **argv) {
    int rv;
    struct proc_fdinfo buf[25] = {};
    char *err = NULL;
    int pid = strtol(argv[1], &err, 10);
    if (*argv[1] == '\0' || *err != '\0') {
        perror("strtol");
        return 1;
    }
    errno = 0;
    rv = proc_pidinfo(
        pid, PROC_PIDLISTFDS, 0, buf, sizeof(buf));
    int gotErrno = errno;
    printf("rv %d errno %d\n", rv, gotErrno);

    int nValid = rv / sizeof (struct proc_fdinfo);
    for (int n = 0; n < nValid; ++n) {
        struct proc_fdinfo *pfd = &buf[n];
        printf("fd %d type=%d\n", pfd->proc_fd, pfd->proc_fdtype);
    }
}
Filed FB14695751 as follows: ``` The API design of libproc's function proc_pidinfo(pid, PROC_PIDLISTFDS, ...) is broken. In particular, it is implemented via a system call wrapper as follows: int proc_pidinfo(int pid, int flavor, uint64_t arg, void *buffer, int buffersize) { int retval; if ((retval = __proc_info(PROC_INFO_CALL_PIDINFO, pid, flavor, arg, buffer, buffersize)) == -1) { return 0; } return retval; } https://github.com/apple-opensource/xnu/blob/4f43d4276fc6a87f2461a3ab18287e4a2e5a1cc0/libsyscall/wrappers/libproc/libproc.c#L100-L110 For PROC_PIDLISTFDS, the return value is the buffer size that has been used for struct proc_fdinfo entries. However, a process can have zero fds! In this case, the wrapper above is simply broken: it returns 0, not because the inner call was -1, but because the inner call was 0! This means that code calling PROC_PIDLISTFDS will appear to spuriously fail if a process has closed all of its fds. The only way to distinguish such a non-failure condition from an actual failure is to inspect the errno for success, but this is not really sound since in general system APIs have no guarantee to have any particular value of errno on success. Compile the samples with: cc look-ma-no-fds.c -o look-ma-no-fds cc fail.c -o fail Then observe the following: Working as normal: ~ » ./fail $$ rv 32 errno 0 fd 0 type=1 fd 1 type=1 fd 2 type=1 fd 10 type=1 But with a process without any fds: ~ » ./look-ma-no-fds & [1] 12625 ~ » ./fail 12625 rv 0 errno 0 Given that one is not *supposed* to read errno since in general errno has no guaranteed value if a function is successful, this is an API design bug. In particular, users of this API have to either copy paste the broken wrapper and write a less broken wrapper, thus linking to internals of libproc (impolite solution) or rely on errno behaviour on the success case, which is, in general, a conventions violation and somewhat fragile. It would be helpful if this design flaw were at least written down in the documentation, however libproc contains almost no documentation comments anywhere in the headers. Downstream bug: https://git.lix.systems/lix-project/lix/issues/446 xcode-select version 2408 ~ » cc --version Apple clang version 15.0.0 (clang-1500.3.9.4) Target: x86_64-apple-darwin23.5.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin ``` look-ma-no-fds.c: ``` #include <stdlib.h> #include <unistd.h> int main(void) { for (int i = 0; i < 1000; ++i) { close(i); } sleep(10000); } ``` fail.c: ``` #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <sys/proc_info.h> #include <sys/sysctl.h> #include <libproc.h> int main(int argc, char **argv) { int rv; struct proc_fdinfo buf[25] = {}; char *err = NULL; int pid = strtol(argv[1], &err, 10); if (*argv[1] == '\0' || *err != '\0') { perror("strtol"); return 1; } errno = 0; rv = proc_pidinfo( pid, PROC_PIDLISTFDS, 0, buf, sizeof(buf)); int gotErrno = errno; printf("rv %d errno %d\n", rv, gotErrno); int nValid = rv / sizeof (struct proc_fdinfo); for (int n = 0; n < nValid; ++n) { struct proc_fdinfo *pfd = &buf[n]; printf("fd %d type=%d\n", pfd->proc_fd, pfd->proc_fdtype); } } ```
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/1723 ("darwin: workaround PROC_PIDLISTFDS on processes with no fds")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/1723", "number": 1723, "kind": "commit message"}], "cl_meta": {"1723": {"change_title": "darwin: workaround PROC_PIDLISTFDS on processes with no fds"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/1723](https://gerrit.lix.systems/c/lix/+/1723) ("darwin: workaround PROC_PIDLISTFDS on processes with no fds")
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#446
No description provided.