<C-C> (Control-C/Ctrl-C) no longer interrupts builds on HEAD #729

Closed
opened 2025-03-12 16:32:42 +00:00 by lunaphied · 19 comments
Owner

Reported elsewhere by @jade. Current head seems to fail to interrupt builds properly when getting interrupted with <C-C>. (As of 85a140accb)

Reported elsewhere by @jade. Current head seems to fail to interrupt builds properly when getting interrupted with `<C-C>`. (As of 85a140accb5592c9b5a73f5ea2156f5b0c853d1c)
Owner

cannot reproduce that so far, do you have a specific example that doesn't work?

cannot reproduce that so far, do you have a specific example that doesn't work?
Owner

I caused this on: 2.93.0-dev-pre20250324-4181796 on macOS.

Setup: running NIX_CONFIG="builders = @/path/to/machines" nix flake check --eval-system x86_64-linux -Lv in the lix-module repo when having to build a lix, then ctrl-c'ing the build of the lix.

Symptom: Ctrl-C of the client does not cause the daemon process to stop. Further, the ssh process held by the daemon stays alive. Further, the nix-daemon --stdio on the remote builder's side remains alive even after killing the Mac's daemon.

macOS daemon back trace:

(lldb) thread backtrace all
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00000001897e101c libsystem_kernel.dylib`kevent + 8
    frame #1: 0x000000010326306c libkj-async.1.0.2.dylib`kj::UnixEventPort::doKqueueWait(timespec*) + 80
    frame #2: 0x000000010326378c libkj-async.1.0.2.dylib`kj::UnixEventPort::wait() + 160
    frame #3: 0x000000010325108c libkj-async.1.0.2.dylib`kj::EventLoop::wait() + 56
    frame #4: 0x000000010325180c libkj-async.1.0.2.dylib`kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj:
:_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation) + 516
    frame #5: 0x000000010398b3d8 liblixstore.dylib`kj::Promise<boost::outcome_v2::basic_result<void, std::exception_ptr, boost::out
come_v2::policy::exception_ptr_rethrow<void, std::exception_ptr, void>>>::wait(kj::WaitScope&, kj::SourceLocation) + 84
    frame #6: 0x00000001039b0da4 liblixstore.dylib`nix::daemon::performOp(nix::AsyncIoRoot&, nix::daemon::TunnelLogger*, nix::ref<n
ix::Store>, nix::TrustedFlag, nix::daemon::RecursiveFlag, unsigned int, nix::Source&, nix::BufferedSink&, nix::WorkerProto::Op) + 4
032
    frame #7: 0x00000001039af2b0 liblixstore.dylib`nix::daemon::processConnection(nix::AsyncIoRoot&, nix::ref<nix::Store>, nix::FdS
ource&, nix::FdSink&, nix::TrustedFlag, nix::daemon::RecursiveFlag) + 1316
    frame #8: 0x0000000102592ad8 nix`std::__1::__function::__func<nix::daemonLoopImpl(std::__1::optional<nix::TrustedFlag>)::$_0, s
td::__1::allocator<nix::daemonLoopImpl(std::__1::optional<nix::TrustedFlag>)::$_0>, void ()>::operator()() + 456
    frame #9: 0x000000010313c608 liblixutil.dylib`std::__1::__function::__func<nix::startProcess(std::__1::function<void ()>, nix::
ProcessOptions const&)::$_0, std::__1::allocator<nix::startProcess(std::__1::function<void ()>, nix::ProcessOptions const&)::$_0>, 
void ()>::operator()() + 72
    frame #10: 0x00000001031397c8 liblixutil.dylib`nix::startProcess(std::__1::function<void ()>, nix::ProcessOptions const&) + 364
    frame #11: 0x0000000102590a54 nix`std::__1::__async_func<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag
>)::$_0>::operator()[abi:fe190107]() + 2680
    frame #12: 0x000000010258ff54 nix`std::__1::__async_assoc_state<void, std::__1::__async_func<nix::daemonLoop(nix::AsyncIoRoot&,
 std::__1::optional<nix::TrustedFlag>)::$_0>>::__execute() + 44
    frame #13: 0x0000000102593ac4 nix`void* std::__1::__thread_proxy[abi:fe190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::_
_thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (std::__1::__async_assoc_state<void, std::__1::__async_f
unc<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag>)::$_0>>::*)(), std::__1::__async_assoc_state<void, std:
:__1::__async_func<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag>)::$_0>>*>>(void*) + 88
    frame #14: 0x000000018981c2e4 libsystem_pthread.dylib`_pthread_start + 136
  thread #2, name = 'signal handler'
    frame #0: 0x00000001897e6960 libsystem_kernel.dylib`__sigwait + 8
    frame #1: 0x000000018981ca9c libsystem_pthread.dylib`sigwait + 40
    frame #2: 0x000000010314c878 liblixutil.dylib`nix::signalHandlerThread(unsigned int) + 52
    frame #3: 0x000000010314d428 liblixutil.dylib`void* std::__1::__thread_proxy[abi:fe190107]<std::__1::tuple<std::__1::unique_ptr
<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(unsigned int), unsigned int>>(void*) + 7
2
    frame #4: 0x000000018981c2e4 libsystem_pthread.dylib`_pthread_start + 136
  thread #3, name = 'MonitorFdHup'
    frame #0: 0x00000001897e3830 libsystem_kernel.dylib`poll + 8
    frame #1: 0x00000001039b6e2c liblixstore.dylib`nix::MonitorFdHup::MonitorFdHup(int)::'lambda'()::operator()() const + 188
    frame #2: 0x00000001039b6ca8 liblixstore.dylib`void* std::__1::__thread_proxy[abi:fe190107]<std::__1::tuple<std::__1::unique_pt
r<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int)::'lambda'()
>>(void*) + 68
    frame #3: 0x000000018981c2e4 libsystem_pthread.dylib`_pthread_start + 136
(lldb) ^D

Remote builder end:

~ » pgrep -af nix-daemon
195017 nix-daemon --daemon
199905 nix-daemon --stdio
199912 nix-daemon 199905
~ » sudo gdb -p 199912
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 199912
[New LWP 199914]
[New LWP 199913]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libthread_db.so.1".
0x00007f9d0e6e57de in epoll_wait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
(gdb) thr apply all bt

Thread 3 (Thread 0x7f9d0c8866c0 (LWP 199913) "signal handler"):
#0  0x00007f9d0e60bcca in sigtimedwait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#1  0x00007f9d0e60b2fc in sigwait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#2  0x00007f9d0faee618 in nix::signalHandlerThread(__sigset_t) () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.
0-dev-pre20250324-4181796/lib/liblixutil.so
#3  0x00007f9d0faef124 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run()
 () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixutil.so
#4  0x00007f9d0e8eb0a4 in execute_native_thread_routine () from target:/nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116-
lib/lib/libstdc++.so.6
#5  0x00007f9d0e6612e3 in start_thread () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#6  0x00007f9d0e6e52fc in __clone3 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6

Thread 2 (Thread 0x7f9cf36fc6c0 (LWP 199914) "MonitorFdHup"):
#0  0x00007f9d0e6d6d4f in poll () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#1  0x000055909fc2628c in nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}::operator()() const ()
#2  0x00007f9d0e8eb0a4 in execute_native_thread_routine () from target:/nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116-
lib/lib/libstdc++.so.6
#3  0x00007f9d0e6612e3 in start_thread () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#4  0x00007f9d0e6e52fc in __clone3 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6

Thread 1 (Thread 0x7f9cf3f056c0 (LWP 199912) "nix-daemon"):
#0  0x00007f9d0e6e57de in epoll_wait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#1  0x00007f9d0ec9a7a8 in kj::UnixEventPort::wait() () from target:/nix/store/ih5f49byxr8b0sjj8dmiqylp47vmcmx4-capnproto-1.0.2/lib/
libkj-async.so.1.0.2
#2  0x00007f9d0ec855f3 in kj::EventLoop::wait() () from target:/nix/store/ih5f49byxr8b0sjj8dmiqylp47vmcmx4-capnproto-1.0.2/lib/libk
j-async.so.1.0.2
#3  0x00007f9d0ec85d77 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::Wait
Scope&, kj::SourceLocation) () from target:/nix/store/ih5f49byxr8b0sjj8dmiqylp47vmcmx4-capnproto-1.0.2/lib/libkj-async.so.1.0.2
#4  0x000055909fbba226 in kj::Promise<boost::outcome_v2::basic_result<nix::BuildResult, std::__exception_ptr::exception_ptr, boost:
:outcome_v2::policy::exception_ptr_rethrow<nix::BuildResult, std::__exception_ptr::exception_ptr, void> > >::wait(kj::WaitScope&, k
j::SourceLocation) ()
--Type <RET> for more, q to quit, c to continue without paging--c
#5  0x00007f9d0f59fcdc in nix::daemon::performOp(nix::AsyncIoRoot&, nix::daemon::TunnelLogger*, nix::ref<nix::Store>, nix::TrustedF
lag, nix::daemon::RecursiveFlag, unsigned int, nix::Source&, nix::BufferedSink&, nix::WorkerProto::Op) () from target:/nix/store/8j
r9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixstore.so
#6  0x00007f9d0f59e34d in nix::daemon::processConnection(nix::AsyncIoRoot&, nix::ref<nix::Store>, nix::FdSource&, nix::FdSink&, nix
::TrustedFlag, nix::daemon::RecursiveFlag) () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-41
81796/lib/liblixstore.so
#7  0x000055909fa993ee in std::_Function_handler<void (), nix::daemonLoopImpl(std::optional<nix::TrustedFlag>)::$_0>::_M_invoke(std
::_Any_data const&) ()
#8  0x00007f9d0fad0a6a in std::_Function_handler<void (), nix::startProcess(std::function<void ()>, nix::ProcessOptions const&)::$_
0>::_M_invoke(std::_Any_data const&) () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/
lib/liblixutil.so
#9  0x00007f9d0facf55e in nix::startProcess(std::function<void ()>, nix::ProcessOptions const&) () from target:/nix/store/8jr9h780f
ccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixutil.so
#10 0x000055909fa97f28 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base
::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_bas
e::_Deleter>, std::thread::_Invoker<std::tuple<nix::daemonLoop(nix::AsyncIoRoot&, std::optional<nix::TrustedFlag>)::$_0> >, void> >
::_M_invoke(std::_Any_data const&) ()
#11 0x000055909fa9e22a in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_ba
se, std::__future_base::_Result_base::_Deleter> ()>*, bool*) ()
#12 0x00007f9d0e6665d7 in __pthread_once_slow.isra.0 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/l
ibc.so.6
#13 0x00007f9d0e666651 in pthread_once@GLIBC_2.2.5 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/lib
c.so.6
#14 0x000055909fa9e010 in std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Resul
t_base, std::__future_base::_Result_base::_Deleter> ()>, bool) ()
#15 0x000055909fa96b92 in std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<nix::daemonLoop(nix::AsyncIoRoot&,
 std::optional<nix::TrustedFlag>)::$_0> >, void>::_M_run() ()
#16 0x00007f9d0e8eb0a4 in execute_native_thread_routine () from target:/nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116-
lib/lib/libstdc++.so.6
#17 0x00007f9d0e6612e3 in start_thread () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#18 0x00007f9d0e6e52fc in __clone3 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
(gdb) quit
A debugging session is active.

        Inferior 1 [process 199912] will be detached.

Quit anyway? (y or n) y
Detaching from program: target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/bin/nix, process 1999
12
[Inferior 1 (process 199912) detached]
~ » pgrep -af nix-daemon
195017 nix-daemon --daemon
199905 nix-daemon --stdio
199912 nix-daemon 199905
~ » sudo gdb -p 199905
Attaching to process 199905
[New LWP 199911]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libthread_db.so.1".
0x00007fdf3bc67964 in select () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
(gdb) thr apply all bt

Thread 2 (Thread 0x7fdf39e0b6c0 (LWP 199911) "signal handler"):
#0  0x00007fdf3bb90cca in sigtimedwait () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#1  0x00007fdf3bb902fc in sigwait () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#2  0x00007fdf3d073618 in nix::signalHandlerThread(__sigset_t) () from /nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-p
re20250324-4181796/lib/liblixutil.so
#3  0x00007fdf3d074124 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run()
 () from /nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixutil.so
#4  0x00007fdf3be700a4 in execute_native_thread_routine () from /nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116-lib/lib
/libstdc++.so.6
#5  0x00007fdf3bbe62e3 in start_thread () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#6  0x00007fdf3bc6a2fc in __clone3 () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6

Thread 1 (Thread 0x7fdf39e14600 (LWP 199905) "nix-daemon"):
#0  0x00007fdf3bc67964 in select () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6
#1  0x000055d664c35c33 in nix::runDaemon(nix::AsyncIoRoot&, bool, std::optional<nix::TrustedFlag>) ()
#2  0x000055d664c3529e in nix::main_nix_daemon(nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits<char>, std::all
ocator<char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) ()
#3  0x000055d664c4027b in int std::__invoke_impl<int, int (*&)(nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits
<char>, std::allocator<char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
 std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), nix::AsyncIoRoot&, std::__cxx
11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::cha
r_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> > > > >(std::__invoke_other, int (*&)(nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<
char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__
cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), nix::AsyncIoRoot&, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&) ()
#4  0x000055d664cd9aaa in nix::mainWrapped(nix::AsyncIoRoot&, int, char**) ()
#5  0x000055d664cddf1d in std::_Function_handler<void (), main::$_0>::_M_invoke(std::_Any_data const&) ()
#6  0x00007fdf3c2db2fa in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con
st&, std::function<void ()>) () from /nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixmain.
so
#7  0x000055d664cdd9db in main ()
I caused this on: 2.93.0-dev-pre20250324-4181796 on macOS. Setup: running `NIX_CONFIG="builders = @/path/to/machines" nix flake check --eval-system x86_64-linux -Lv` in the lix-module repo when having to build a lix, then ctrl-c'ing the build of the lix. Symptom: Ctrl-C of the client does not cause the daemon process to stop. Further, the ssh process held by the daemon stays alive. Further, the `nix-daemon --stdio` on the remote builder's side remains alive even after killing the Mac's daemon. macOS daemon back trace: ``` (lldb) thread backtrace all * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00000001897e101c libsystem_kernel.dylib`kevent + 8 frame #1: 0x000000010326306c libkj-async.1.0.2.dylib`kj::UnixEventPort::doKqueueWait(timespec*) + 80 frame #2: 0x000000010326378c libkj-async.1.0.2.dylib`kj::UnixEventPort::wait() + 160 frame #3: 0x000000010325108c libkj-async.1.0.2.dylib`kj::EventLoop::wait() + 56 frame #4: 0x000000010325180c libkj-async.1.0.2.dylib`kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj: :_::ExceptionOrValue&, kj::WaitScope&, kj::SourceLocation) + 516 frame #5: 0x000000010398b3d8 liblixstore.dylib`kj::Promise<boost::outcome_v2::basic_result<void, std::exception_ptr, boost::out come_v2::policy::exception_ptr_rethrow<void, std::exception_ptr, void>>>::wait(kj::WaitScope&, kj::SourceLocation) + 84 frame #6: 0x00000001039b0da4 liblixstore.dylib`nix::daemon::performOp(nix::AsyncIoRoot&, nix::daemon::TunnelLogger*, nix::ref<n ix::Store>, nix::TrustedFlag, nix::daemon::RecursiveFlag, unsigned int, nix::Source&, nix::BufferedSink&, nix::WorkerProto::Op) + 4 032 frame #7: 0x00000001039af2b0 liblixstore.dylib`nix::daemon::processConnection(nix::AsyncIoRoot&, nix::ref<nix::Store>, nix::FdS ource&, nix::FdSink&, nix::TrustedFlag, nix::daemon::RecursiveFlag) + 1316 frame #8: 0x0000000102592ad8 nix`std::__1::__function::__func<nix::daemonLoopImpl(std::__1::optional<nix::TrustedFlag>)::$_0, s td::__1::allocator<nix::daemonLoopImpl(std::__1::optional<nix::TrustedFlag>)::$_0>, void ()>::operator()() + 456 frame #9: 0x000000010313c608 liblixutil.dylib`std::__1::__function::__func<nix::startProcess(std::__1::function<void ()>, nix:: ProcessOptions const&)::$_0, std::__1::allocator<nix::startProcess(std::__1::function<void ()>, nix::ProcessOptions const&)::$_0>, void ()>::operator()() + 72 frame #10: 0x00000001031397c8 liblixutil.dylib`nix::startProcess(std::__1::function<void ()>, nix::ProcessOptions const&) + 364 frame #11: 0x0000000102590a54 nix`std::__1::__async_func<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag >)::$_0>::operator()[abi:fe190107]() + 2680 frame #12: 0x000000010258ff54 nix`std::__1::__async_assoc_state<void, std::__1::__async_func<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag>)::$_0>>::__execute() + 44 frame #13: 0x0000000102593ac4 nix`void* std::__1::__thread_proxy[abi:fe190107]<std::__1::tuple<std::__1::unique_ptr<std::__1::_ _thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (std::__1::__async_assoc_state<void, std::__1::__async_f unc<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag>)::$_0>>::*)(), std::__1::__async_assoc_state<void, std: :__1::__async_func<nix::daemonLoop(nix::AsyncIoRoot&, std::__1::optional<nix::TrustedFlag>)::$_0>>*>>(void*) + 88 frame #14: 0x000000018981c2e4 libsystem_pthread.dylib`_pthread_start + 136 thread #2, name = 'signal handler' frame #0: 0x00000001897e6960 libsystem_kernel.dylib`__sigwait + 8 frame #1: 0x000000018981ca9c libsystem_pthread.dylib`sigwait + 40 frame #2: 0x000000010314c878 liblixutil.dylib`nix::signalHandlerThread(unsigned int) + 52 frame #3: 0x000000010314d428 liblixutil.dylib`void* std::__1::__thread_proxy[abi:fe190107]<std::__1::tuple<std::__1::unique_ptr <std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(unsigned int), unsigned int>>(void*) + 7 2 frame #4: 0x000000018981c2e4 libsystem_pthread.dylib`_pthread_start + 136 thread #3, name = 'MonitorFdHup' frame #0: 0x00000001897e3830 libsystem_kernel.dylib`poll + 8 frame #1: 0x00000001039b6e2c liblixstore.dylib`nix::MonitorFdHup::MonitorFdHup(int)::'lambda'()::operator()() const + 188 frame #2: 0x00000001039b6ca8 liblixstore.dylib`void* std::__1::__thread_proxy[abi:fe190107]<std::__1::tuple<std::__1::unique_pt r<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int)::'lambda'() >>(void*) + 68 frame #3: 0x000000018981c2e4 libsystem_pthread.dylib`_pthread_start + 136 (lldb) ^D ``` Remote builder end: ``` ~ » pgrep -af nix-daemon 195017 nix-daemon --daemon 199905 nix-daemon --stdio 199912 nix-daemon 199905 ~ » sudo gdb -p 199912 For help, type "help". Type "apropos word" to search for commands related to "word". Attaching to process 199912 [New LWP 199914] [New LWP 199913] [Thread debugging using libthread_db enabled] Using host libthread_db library "/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libthread_db.so.1". 0x00007f9d0e6e57de in epoll_wait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 (gdb) thr apply all bt Thread 3 (Thread 0x7f9d0c8866c0 (LWP 199913) "signal handler"): #0 0x00007f9d0e60bcca in sigtimedwait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #1 0x00007f9d0e60b2fc in sigwait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #2 0x00007f9d0faee618 in nix::signalHandlerThread(__sigset_t) () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93. 0-dev-pre20250324-4181796/lib/liblixutil.so #3 0x00007f9d0faef124 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run() () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixutil.so #4 0x00007f9d0e8eb0a4 in execute_native_thread_routine () from target:/nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116- lib/lib/libstdc++.so.6 #5 0x00007f9d0e6612e3 in start_thread () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #6 0x00007f9d0e6e52fc in __clone3 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 Thread 2 (Thread 0x7f9cf36fc6c0 (LWP 199914) "MonitorFdHup"): #0 0x00007f9d0e6d6d4f in poll () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #1 0x000055909fc2628c in nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}::operator()() const () #2 0x00007f9d0e8eb0a4 in execute_native_thread_routine () from target:/nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116- lib/lib/libstdc++.so.6 #3 0x00007f9d0e6612e3 in start_thread () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #4 0x00007f9d0e6e52fc in __clone3 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 Thread 1 (Thread 0x7f9cf3f056c0 (LWP 199912) "nix-daemon"): #0 0x00007f9d0e6e57de in epoll_wait () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #1 0x00007f9d0ec9a7a8 in kj::UnixEventPort::wait() () from target:/nix/store/ih5f49byxr8b0sjj8dmiqylp47vmcmx4-capnproto-1.0.2/lib/ libkj-async.so.1.0.2 #2 0x00007f9d0ec855f3 in kj::EventLoop::wait() () from target:/nix/store/ih5f49byxr8b0sjj8dmiqylp47vmcmx4-capnproto-1.0.2/lib/libk j-async.so.1.0.2 #3 0x00007f9d0ec85d77 in kj::_::waitImpl(kj::Own<kj::_::PromiseNode, kj::_::PromiseDisposer>&&, kj::_::ExceptionOrValue&, kj::Wait Scope&, kj::SourceLocation) () from target:/nix/store/ih5f49byxr8b0sjj8dmiqylp47vmcmx4-capnproto-1.0.2/lib/libkj-async.so.1.0.2 #4 0x000055909fbba226 in kj::Promise<boost::outcome_v2::basic_result<nix::BuildResult, std::__exception_ptr::exception_ptr, boost: :outcome_v2::policy::exception_ptr_rethrow<nix::BuildResult, std::__exception_ptr::exception_ptr, void> > >::wait(kj::WaitScope&, k j::SourceLocation) () --Type <RET> for more, q to quit, c to continue without paging--c #5 0x00007f9d0f59fcdc in nix::daemon::performOp(nix::AsyncIoRoot&, nix::daemon::TunnelLogger*, nix::ref<nix::Store>, nix::TrustedF lag, nix::daemon::RecursiveFlag, unsigned int, nix::Source&, nix::BufferedSink&, nix::WorkerProto::Op) () from target:/nix/store/8j r9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixstore.so #6 0x00007f9d0f59e34d in nix::daemon::processConnection(nix::AsyncIoRoot&, nix::ref<nix::Store>, nix::FdSource&, nix::FdSink&, nix ::TrustedFlag, nix::daemon::RecursiveFlag) () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-41 81796/lib/liblixstore.so #7 0x000055909fa993ee in std::_Function_handler<void (), nix::daemonLoopImpl(std::optional<nix::TrustedFlag>)::$_0>::_M_invoke(std ::_Any_data const&) () #8 0x00007f9d0fad0a6a in std::_Function_handler<void (), nix::startProcess(std::function<void ()>, nix::ProcessOptions const&)::$_ 0>::_M_invoke(std::_Any_data const&) () from target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/ lib/liblixutil.so #9 0x00007f9d0facf55e in nix::startProcess(std::function<void ()>, nix::ProcessOptions const&) () from target:/nix/store/8jr9h780f ccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixutil.so #10 0x000055909fa97f28 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base ::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_bas e::_Deleter>, std::thread::_Invoker<std::tuple<nix::daemonLoop(nix::AsyncIoRoot&, std::optional<nix::TrustedFlag>)::$_0> >, void> > ::_M_invoke(std::_Any_data const&) () #11 0x000055909fa9e22a in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_ba se, std::__future_base::_Result_base::_Deleter> ()>*, bool*) () #12 0x00007f9d0e6665d7 in __pthread_once_slow.isra.0 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/l ibc.so.6 #13 0x00007f9d0e666651 in pthread_once@GLIBC_2.2.5 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/lib c.so.6 #14 0x000055909fa9e010 in std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Resul t_base, std::__future_base::_Result_base::_Deleter> ()>, bool) () #15 0x000055909fa96b92 in std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<nix::daemonLoop(nix::AsyncIoRoot&, std::optional<nix::TrustedFlag>)::$_0> >, void>::_M_run() () #16 0x00007f9d0e8eb0a4 in execute_native_thread_routine () from target:/nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116- lib/lib/libstdc++.so.6 #17 0x00007f9d0e6612e3 in start_thread () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #18 0x00007f9d0e6e52fc in __clone3 () from target:/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 (gdb) quit A debugging session is active. Inferior 1 [process 199912] will be detached. Quit anyway? (y or n) y Detaching from program: target:/nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/bin/nix, process 1999 12 [Inferior 1 (process 199912) detached] ~ » pgrep -af nix-daemon 195017 nix-daemon --daemon 199905 nix-daemon --stdio 199912 nix-daemon 199905 ~ » sudo gdb -p 199905 Attaching to process 199905 [New LWP 199911] [Thread debugging using libthread_db enabled] Using host libthread_db library "/nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libthread_db.so.1". 0x00007fdf3bc67964 in select () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 (gdb) thr apply all bt Thread 2 (Thread 0x7fdf39e0b6c0 (LWP 199911) "signal handler"): #0 0x00007fdf3bb90cca in sigtimedwait () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #1 0x00007fdf3bb902fc in sigwait () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #2 0x00007fdf3d073618 in nix::signalHandlerThread(__sigset_t) () from /nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-p re20250324-4181796/lib/liblixutil.so #3 0x00007fdf3d074124 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(__sigset_t), __sigset_t> > >::_M_run() () from /nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixutil.so #4 0x00007fdf3be700a4 in execute_native_thread_routine () from /nix/store/6kbrc4ca98srlfpgyaayl2q9zpg1gys6-gcc-14-20241116-lib/lib /libstdc++.so.6 #5 0x00007fdf3bbe62e3 in start_thread () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #6 0x00007fdf3bc6a2fc in __clone3 () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 Thread 1 (Thread 0x7fdf39e14600 (LWP 199905) "nix-daemon"): #0 0x00007fdf3bc67964 in select () from /nix/store/cmpyglinc9xl9pr4ymx8akl286ygl64x-glibc-2.40-66/lib/libc.so.6 #1 0x000055d664c35c33 in nix::runDaemon(nix::AsyncIoRoot&, bool, std::optional<nix::TrustedFlag>) () #2 0x000055d664c3529e in nix::main_nix_daemon(nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits<char>, std::all ocator<char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator< std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) () #3 0x000055d664c4027b in int std::__invoke_impl<int, int (*&)(nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits <char>, std::allocator<char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), nix::AsyncIoRoot&, std::__cxx 11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::cha r_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char > > > > >(std::__invoke_other, int (*&)(nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator< char> >, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__ cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), nix::AsyncIoRoot&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::list<std::__cxx11::basic_string<char, std::char_traits<char>, std:: allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&&) () #4 0x000055d664cd9aaa in nix::mainWrapped(nix::AsyncIoRoot&, int, char**) () #5 0x000055d664cddf1d in std::_Function_handler<void (), main::$_0>::_M_invoke(std::_Any_data const&) () #6 0x00007fdf3c2db2fa in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con st&, std::function<void ()>) () from /nix/store/8jr9h780fccwwzh8pdd94svra70fdl2y-lix-2.93.0-dev-pre20250324-4181796/lib/liblixmain. so #7 0x000055d664cdd9db in main () ```
Owner

we cannot reproduce this on linux with a darwin eval-system. could be a platform issue of some sort?

we cannot reproduce this on linux with a darwin eval-system. could be a platform issue of some sort?
Owner

likely. it happened yesterday on my Mac.

likely. it happened yesterday on my Mac.
Owner

then we can't be of any assistance here :(

then we can't be of any assistance here :(
jade self-assigned this 2025-04-04 01:42:41 +00:00
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/2933 ("MonitorFdHup: introduce a test")
  • commit message in cl/3066 ("fix: Terminate daemons properly on Ctrl-C on macOS")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/2933", "number": 2933, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/3066", "number": 3066, "kind": "commit message"}], "cl_meta": {"2933": {"change_title": "MonitorFdHup: introduce a test"}, "3066": {"change_title": "fix: Terminate daemons properly on Ctrl-C on macOS"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/2933](https://gerrit.lix.systems/c/lix/+/2933) ("MonitorFdHup: introduce a test") * commit message in [cl/3066](https://gerrit.lix.systems/c/lix/+/3066) ("fix: Terminate daemons properly on Ctrl-C on macOS")
Owner

2.91 used to just terminate the daemon by it trying to write to a hung-up socket which caused it to terminate. This used to happen:

lix/lix2 » sudo ./result/bin/nix-daemon --daemon
warning: $HOME ('/Users/jade') is not owned by you, falling back to the one defined in the 'passwd' file ('/var/root')
accepted connection from pid <unknown>, user jade (trusted)
warning: $HOME ('/Users/jade') is not owned by you, falling back to the one defined in the 'passwd' file ('/var/root')
unexpected Nix daemon error: error: writing to file: Broken pipe
accepted connection from pid <unknown>, user jade (trusted)
warning: $HOME ('/Users/jade') is not owned by you, falling back to the one defined in the 'passwd' file ('/var/root')
unexpected Nix daemon error: error: writing to file: Broken pipe

This was probably a bug that MonitorFdHup wasn't catching it? I don't know why the broken pipe isn't causing the 2.93 daemon to throw an error and terminate on that basis.

2.91 used to just terminate the daemon by it trying to write to a hung-up socket which caused it to terminate. This used to happen: ``` lix/lix2 » sudo ./result/bin/nix-daemon --daemon warning: $HOME ('/Users/jade') is not owned by you, falling back to the one defined in the 'passwd' file ('/var/root') accepted connection from pid <unknown>, user jade (trusted) warning: $HOME ('/Users/jade') is not owned by you, falling back to the one defined in the 'passwd' file ('/var/root') unexpected Nix daemon error: error: writing to file: Broken pipe accepted connection from pid <unknown>, user jade (trusted) warning: $HOME ('/Users/jade') is not owned by you, falling back to the one defined in the 'passwd' file ('/var/root') unexpected Nix daemon error: error: writing to file: Broken pipe ``` This was probably a bug that MonitorFdHup wasn't catching it? I don't know why the broken pipe isn't causing the 2.93 daemon to throw an error and terminate on that basis.
Owner
$ nrb nix build -L --expr 'builtins.derivation { system = "x86_64-linux"; name = "foo6"; builder = "/bin/sh"; args = ["-c" "/run/current-system/sw/bin/sleep 1d"]; __noChroot = true; }'

Interestingly, it seems that it needs to get to a sufficiently complex build to actually have the problem to begin with; the above does not repro it. This is wild.

``` $ nrb nix build -L --expr 'builtins.derivation { system = "x86_64-linux"; name = "foo6"; builder = "/bin/sh"; args = ["-c" "/run/current-system/sw/bin/sleep 1d"]; __noChroot = true; }' ``` Interestingly, it seems that it needs to get to a sufficiently complex build to actually have the problem to begin with; the above does not repro it. This is *wild*.
Owner

Okay. So it only happens if you nix eval on the same connection first.... WHAT.

Okay. So it only happens if you `nix eval` on the same connection first.... WHAT.
Owner

Reproducer (thanks to horrors pointing to the right direction):

builtins.derivation { system = "aarch64-darwin"; lol = "${./README.md}"; name = "foo6"; builder = "/bin/sh"; args = ["-c" "/bin/sleep 1d"]; }
Reproducer (thanks to horrors pointing to the right direction): ``` builtins.derivation { system = "aarch64-darwin"; lol = "${./README.md}"; name = "foo6"; builder = "/bin/sh"; args = ["-c" "/bin/sleep 1d"]; } ```
Owner

Here's the minimal debug log of it. I've disabled build-hook and removed the queryPaths invocations in nix-build, so it's just AddToStore, AddToStore, then BuildPaths.

lix/lix2 » nix-build --debug fail.nix
starting pool remote stderr of 18446744073709551615 threads
copying '/Users/jade/lix/lix2/README.md' to the store...
MEOW open connection
performing daemon worker op: 7
acquired write lock on '/nix/var/nix/temproots/45678'
copied source '/Users/jade/lix/lix2/README.md' -> '/nix/store/6kax1wlb366cs5rv0pkblk828d35q83n-README.md'
performing daemon worker op: 7
instantiated 'foo6' -> '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv'
performing daemon worker op: 9
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: created
entered goal loop
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: init
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: loading derivation
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: have derivation
substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': created
substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': init
substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': trying next substituter
substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': trying next substituter
path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6' is required, but there is no substituter that can build it
substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': done
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: waitee 'substitution of '/nix/store/w7jx
jjj883pg9qprrx6cwql7z66mxl3j-foo6'' done; 0 left
substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': goal destroyed
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: all outputs substituted (maybe)
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: all inputs realised
added input paths '/nix/store/6kax1wlb366cs5rv0pkblk828d35q83n-README.md'
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: trying to build
locking path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6'
lock acquired on '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6.lock'
removing invalid path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6'
lock released on '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6.lock'
building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: trying to build
locking path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6'
lock acquired on '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6.lock'
removing invalid path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6'
found build user '_nixbld1'
found build user '_nixbld2'
found build user '_nixbld3'
found build user '_nixbld4'
found build user '_nixbld5'
found build user '_nixbld6'
found build user '_nixbld7'
found build user '_nixbld8'
found build user '_nixbld9'
found build user '_nixbld10'
found build user '_nixbld11'
found build user '_nixbld12'
found build user '_nixbld13'
found build user '_nixbld14'
found build user '_nixbld15'
found build user '_nixbld16'
found build user '_nixbld17'
found build user '_nixbld18'
found build user '_nixbld19'
found build user '_nixbld20'
found build user '_nixbld21'
found build user '_nixbld22'
found build user '_nixbld23'
found build user '_nixbld24'
found build user '_nixbld25'
found build user '_nixbld26'
found build user '_nixbld27'
found build user '_nixbld28'
found build user '_nixbld29'
found build user '_nixbld30'
found build user '_nixbld31'
found build user '_nixbld32'
trying user '_nixbld1'
killing all processes running under uid '351'
executing builder '/bin/sh'
using builder args '-c /bin/sleep 1d'
sandbox setup: Generated sandbox profile:
sandbox setup: (version 1)
sandbox setup:
sandbox setup:
sandbox setup: (allow default)
sandbox setup:
sandbox setup: ; Disallow creating setuid/setgid binaries, since that
sandbox setup: ; would allow breaking build user isolation.
sandbox setup: (deny file-write-setugid)
sandbox setup:
sandbox setup:
building '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv'...
^Cclosing daemon connection because of an exception
reaping 1 worker threads
error: interrupted by the user
Here's the minimal debug log of it. I've disabled build-hook and removed the queryPaths invocations in `nix-build`, so it's *just* AddToStore, AddToStore, then BuildPaths. ``` lix/lix2 » nix-build --debug fail.nix starting pool remote stderr of 18446744073709551615 threads copying '/Users/jade/lix/lix2/README.md' to the store... MEOW open connection performing daemon worker op: 7 acquired write lock on '/nix/var/nix/temproots/45678' copied source '/Users/jade/lix/lix2/README.md' -> '/nix/store/6kax1wlb366cs5rv0pkblk828d35q83n-README.md' performing daemon worker op: 7 instantiated 'foo6' -> '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv' performing daemon worker op: 9 building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: created entered goal loop building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: init building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: loading derivation building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: have derivation substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': created substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': init substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': trying next substituter substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': trying next substituter path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6' is required, but there is no substituter that can build it substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': done building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: waitee 'substitution of '/nix/store/w7jx jjj883pg9qprrx6cwql7z66mxl3j-foo6'' done; 0 left substitution of '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6': goal destroyed building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: all outputs substituted (maybe) building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: all inputs realised added input paths '/nix/store/6kax1wlb366cs5rv0pkblk828d35q83n-README.md' building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: trying to build locking path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6' lock acquired on '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6.lock' removing invalid path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6' lock released on '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6.lock' building of '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv^out' from .drv file: trying to build locking path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6' lock acquired on '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6.lock' removing invalid path '/nix/store/w7jxjjj883pg9qprrx6cwql7z66mxl3j-foo6' found build user '_nixbld1' found build user '_nixbld2' found build user '_nixbld3' found build user '_nixbld4' found build user '_nixbld5' found build user '_nixbld6' found build user '_nixbld7' found build user '_nixbld8' found build user '_nixbld9' found build user '_nixbld10' found build user '_nixbld11' found build user '_nixbld12' found build user '_nixbld13' found build user '_nixbld14' found build user '_nixbld15' found build user '_nixbld16' found build user '_nixbld17' found build user '_nixbld18' found build user '_nixbld19' found build user '_nixbld20' found build user '_nixbld21' found build user '_nixbld22' found build user '_nixbld23' found build user '_nixbld24' found build user '_nixbld25' found build user '_nixbld26' found build user '_nixbld27' found build user '_nixbld28' found build user '_nixbld29' found build user '_nixbld30' found build user '_nixbld31' found build user '_nixbld32' trying user '_nixbld1' killing all processes running under uid '351' executing builder '/bin/sh' using builder args '-c /bin/sleep 1d' sandbox setup: Generated sandbox profile: sandbox setup: (version 1) sandbox setup: sandbox setup: sandbox setup: (allow default) sandbox setup: sandbox setup: ; Disallow creating setuid/setgid binaries, since that sandbox setup: ; would allow breaking build user isolation. sandbox setup: (deny file-write-setugid) sandbox setup: sandbox setup: building '/nix/store/g9834f16ljbj3kc6r2azq96idzdz43q0-foo6.drv'... ^Cclosing daemon connection because of an exception reaping 1 worker threads error: interrupted by the user ```
Owner

I had the hypothesis that kqueue and poll on the same fd were causing the relevant event to disappear, since it has been observed to not be delivered to the monitor fd hup thread.

However:

#include <iostream>
#include <thread>
#include <sys/poll.h>
#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>
#include <unistd.h>

void thr1_go(int fd)
{
    int kq = kqueue();
    std::unique_ptr<struct kevent[]> kevs = std::make_unique<struct kevent[]>(1);
    EV_SET(kevs.get(), fd, EVFILT_READ, EV_ADD | EV_CLEAR, 0, 0, nullptr);
    int rv = kevent(kq, kevs.get(), 1, nullptr, 0, nullptr);
    if (rv < 0) {
        perror("kevent");
        return;
    }
    while (true) {
        auto new_evs = std::make_unique<struct kevent[]>(5);
        int nevs = kevent(kq, nullptr, 0, new_evs.get(), 5, nullptr);
        if (nevs < 0) {
            perror("kevent listen");
            return;
        }
        std::cout << "kevent: " << nevs << "\n";
        for (int i = 0; i < nevs; ++i) {
            __builtin_dump_struct(&new_evs[i], printf);
        }
    }
}

void thr2_go(int fd)
{
    struct pollfd fds[1];
    fds[0].events = POLL_HUP;
    fds[0].fd = fd;
    while (true) {
        int count = poll(fds, 1, -1);
        if (count == -1) {
            perror("poll");
            return;
        }
        if (fds[0].revents & POLLHUP) {
            std::cout << "POLLHUP\n";
            break;
        }
    }
}

int main(void)
{
    int pipe_fds[2];
    int rv = pipe(pipe_fds);
    if (rv < 0) {
        perror("pipe");
        return 1;
    }
    std::thread thr1{[&pipe_fds]() { thr1_go(pipe_fds[1]); }};
    std::thread thr2{[&pipe_fds]() { thr2_go(pipe_fds[1]); }};

    sleep(1);
    close(pipe_fds[0]);
    thr1.join();
    thr2.join();
}
lix/lix2 » ./test
POLLHUP
kevent: 1
kevent {
  uintptr_t ident = 4
  int16_t filter = -1
  uint16_t flags = 32801
  uint32_t fflags = 0
  intptr_t data = 0
  void * udata = 0x0
}

So I still don't know what is going wrong here, but a trivial program doesn't do a bad thing, so presumably there's more to it than that. Time for more dtrace.

I had the hypothesis that kqueue and poll on the same fd were causing the relevant event to disappear, since it has been observed to not be delivered to the monitor fd hup thread. However: ```c++ #include <iostream> #include <thread> #include <sys/poll.h> #include <sys/types.h> #include <sys/event.h> #include <sys/time.h> #include <unistd.h> void thr1_go(int fd) { int kq = kqueue(); std::unique_ptr<struct kevent[]> kevs = std::make_unique<struct kevent[]>(1); EV_SET(kevs.get(), fd, EVFILT_READ, EV_ADD | EV_CLEAR, 0, 0, nullptr); int rv = kevent(kq, kevs.get(), 1, nullptr, 0, nullptr); if (rv < 0) { perror("kevent"); return; } while (true) { auto new_evs = std::make_unique<struct kevent[]>(5); int nevs = kevent(kq, nullptr, 0, new_evs.get(), 5, nullptr); if (nevs < 0) { perror("kevent listen"); return; } std::cout << "kevent: " << nevs << "\n"; for (int i = 0; i < nevs; ++i) { __builtin_dump_struct(&new_evs[i], printf); } } } void thr2_go(int fd) { struct pollfd fds[1]; fds[0].events = POLL_HUP; fds[0].fd = fd; while (true) { int count = poll(fds, 1, -1); if (count == -1) { perror("poll"); return; } if (fds[0].revents & POLLHUP) { std::cout << "POLLHUP\n"; break; } } } int main(void) { int pipe_fds[2]; int rv = pipe(pipe_fds); if (rv < 0) { perror("pipe"); return 1; } std::thread thr1{[&pipe_fds]() { thr1_go(pipe_fds[1]); }}; std::thread thr2{[&pipe_fds]() { thr2_go(pipe_fds[1]); }}; sleep(1); close(pipe_fds[0]); thr1.join(); thr2.join(); } ``` ``` lix/lix2 » ./test POLLHUP kevent: 1 kevent { uintptr_t ident = 4 int16_t filter = -1 uint16_t flags = 32801 uint32_t fflags = 0 intptr_t data = 0 void * udata = 0x0 } ``` So I still don't know what is going wrong here, but a trivial program doesn't do a bad thing, so presumably there's more to it than that. Time for more dtrace.
Owner

This is my test setup patch on Lix:

diff --git a/lix/legacy/nix-build.cc b/lix/legacy/nix-build.cc
index 94aba0c07..dd2fd991a 100644
--- a/lix/legacy/nix-build.cc
+++ b/lix/legacy/nix-build.cc
@@ -309,14 +309,14 @@ static void main_nix_build(AsyncIoRoot & aio, std::string programName, Strings a
            fetch binary cache data. */
         uint64_t downloadSize, narSize;
         StorePathSet willBuild, willSubstitute, unknown;
-        aio.blockOn(store->queryMissing(paths,
-            willBuild, willSubstitute, unknown, downloadSize, narSize));
-
-        if (settings.printMissing) {
-            aio.blockOn(printMissing(
-                ref<Store>(store), willBuild, willSubstitute, unknown, downloadSize, narSize
-            ));
-        }
+        // aio.blockOn(store->queryMissing(paths,
+        //     willBuild, willSubstitute, unknown, downloadSize, narSize));
+        //
+        // if (settings.printMissing) {
+        //     aio.blockOn(printMissing(
+        //         ref<Store>(store), willBuild, willSubstitute, unknown, downloadSize, narSize
+        //     ));
+        // }
 
         if (!dryRun)
             aio.blockOn(store->buildPaths(paths, buildMode, evalStore));
diff --git a/lix/libstore/build/worker.hh b/lix/libstore/build/worker.hh
index 7fc3d1fe9..f819d7979 100644
--- a/lix/libstore/build/worker.hh
+++ b/lix/libstore/build/worker.hh
@@ -207,7 +207,7 @@ public:
          * Whether to ask the build hook if it can build a derivation. If
          * it answers with "decline-permanently", we don't try again.
          */
-        bool available = true;
+        bool available = false;
     };
 
     HookState hook;
diff --git a/lix/libstore/remote-store.cc b/lix/libstore/remote-store.cc
index b3c74ab73..edfd52f3d 100644
--- a/lix/libstore/remote-store.cc
+++ b/lix/libstore/remote-store.cc
@@ -1001,7 +1001,9 @@ try {
 RemoteStore::Connection::~Connection()
 {
     try {
+        debug("closing remote store connection");
         to.flush();
+        debug("closed remote store connection");
     } catch (...) {
         ignoreExceptionInDestructor();
     }
diff --git a/lix/libstore/uds-remote-store.cc b/lix/libstore/uds-remote-store.cc
index ca837f524..bf510a747 100644
--- a/lix/libstore/uds-remote-store.cc
+++ b/lix/libstore/uds-remote-store.cc
@@ -71,6 +71,8 @@ ref<RemoteStore::Connection> UDSRemoteStore::openConnection()
 
     conn->startTime = std::chrono::steady_clock::now();
 
+    debug("MEOW open connection");
+
     return conn;
 }
 
diff --git a/lix/libutil/monitor-fd.cc b/lix/libutil/monitor-fd.cc
index 9f35865e3..bd9d82ecd 100644
--- a/lix/libutil/monitor-fd.cc
+++ b/lix/libutil/monitor-fd.cc
@@ -1,14 +1,15 @@
 #include "monitor-fd.hh"
+#include "file-descriptor.hh"
 
 namespace nix {
 
-void MonitorFdHup::runThread(int watchFd, int terminateFd)
+void MonitorFdHup::runThread(AutoCloseFD watchFd, int terminateFd)
 {
     setCurrentThreadName("MonitorFdHup");
     while (!quit) {
         /* Wait indefinitely until a POLLHUP occurs. */
         struct pollfd fds[2];
-        fds[0].fd = watchFd;
+        fds[0].fd = watchFd.get();
         // There is a POSIX violation on macOS: you have to listen for
         // at least POLLHUP to receive HUP events for a FD. POSIX says
         // this is not so, and you should just receive them regardless,
@@ -43,6 +44,7 @@ void MonitorFdHup::runThread(int watchFd, int terminateFd)
             continue;
         }
         if (fds[0].revents & POLLHUP) {
+            std::cerr << "wtf!\n";
             callback();
             break;
         }
@@ -70,7 +72,11 @@ MonitorFdHup::MonitorFdHup(int fd, std::function<void()> callback) : callback(ca
 {
     terminatePipe.create();
     int terminateFd = terminatePipe.readSide.get();
-    thread = std::thread([this, fd, terminateFd]() { this->runThread(fd, terminateFd); });
+    int dupped = dup(fd);
+    AutoCloseFD watchFd{dupped};
+    thread = std::thread([this, myWatchFd = std::move(watchFd), terminateFd]() mutable {
+        this->runThread(std::move(myWatchFd), terminateFd);
+    });
 };
 
 MonitorFdHup::~MonitorFdHup()
diff --git a/lix/libutil/monitor-fd.hh b/lix/libutil/monitor-fd.hh
index 97a914ab9..3bf849e0f 100644
--- a/lix/libutil/monitor-fd.hh
+++ b/lix/libutil/monitor-fd.hh
@@ -28,7 +28,7 @@ private:
     std::atomic_bool quit = false;
     std::function<void()> callback;
 
-    void runThread(int watchFd, int terminateFd);
+    void runThread(AutoCloseFD watchFd, int terminateFd);
 
 public:
     MonitorFdHup(int fd, std::function<void()> callback = nix::triggerInterrupt);
diff --git a/lix/libutil/thread-pool.cc b/lix/libutil/thread-pool.cc
index cdb1e11d6..a0e0f220a 100644
--- a/lix/libutil/thread-pool.cc
+++ b/lix/libutil/thread-pool.cc
@@ -14,7 +14,7 @@ ThreadPool::ThreadPool(const char * name, size_t _maxThreads)
         if (!maxThreads) maxThreads = 1;
     }
 
-    debug("starting pool of %d threads", maxThreads);
+    debug("starting pool %s of %d threads", name, maxThreads);
 }
 
 ThreadPool::~ThreadPool()
diff --git a/lix/nix/nar.cc b/lix/nix/nar.cc
index 4730b10b1..8203a9442 100644
--- a/lix/nix/nar.cc
+++ b/lix/nix/nar.cc
@@ -1,8 +1,14 @@
 #include "lix/libcmd/command.hh"
+#include "lix/libstore/path.hh"
+#include "lix/libstore/store-api.hh"
+#include "lix/libstore/uds-remote-store.hh"
+#include "lix/libutil/async.hh"
 #include "nar.hh"
 
 namespace nix {
 
+void go();
+
 struct CmdNar : MultiCommand
 {
     CmdNar() : MultiCommand(CommandRegistry::getCommandsFor({"nar"}))
@@ -22,12 +28,7 @@ struct CmdNar : MultiCommand
 
     Category category() override { return catUtility; }
 
-    void run() override
-    {
-        if (!command)
-            throw UsageError("'nix nar' requires a sub-command.");
-        command->second->run();
-    }
+    void run() override;
 };
 
 void registerNixNar()
@@ -35,4 +36,11 @@ void registerNixNar()
     registerCommand<CmdNar>("nar");
 }
 
+void CmdNar::run()
+{
+    ref<UDSRemoteStore> store = *aio().blockOn(openStore("daemon")).try_cast<UDSRemoteStore>();
+    auto pathInfo = aio().blockOn(store->queryPathInfo(StorePath("d0p7kq6l0hi0faf9agcfmm6cq4k1470w-lix-2.93.0-dev-pre20250410-81d4769")));
+    __builtin_dump_struct(&pathInfo, printf);
+}
+
 }

The stuff in nar.cc is unnecessary; I was hoping to repro without needing an evaluator but it turned out that the evaluator isn't that bad since I can do it without nixpkgs.

The repro above inconsistently creates the problem. I have dtruss output, but it's not of much help because the kqueue calls aren't dumped usefully.

I think the first step I should do dtrace-wise is to dump kqueue calls correctly.

This is my test setup patch on Lix: ```diff diff --git a/lix/legacy/nix-build.cc b/lix/legacy/nix-build.cc index 94aba0c07..dd2fd991a 100644 --- a/lix/legacy/nix-build.cc +++ b/lix/legacy/nix-build.cc @@ -309,14 +309,14 @@ static void main_nix_build(AsyncIoRoot & aio, std::string programName, Strings a fetch binary cache data. */ uint64_t downloadSize, narSize; StorePathSet willBuild, willSubstitute, unknown; - aio.blockOn(store->queryMissing(paths, - willBuild, willSubstitute, unknown, downloadSize, narSize)); - - if (settings.printMissing) { - aio.blockOn(printMissing( - ref<Store>(store), willBuild, willSubstitute, unknown, downloadSize, narSize - )); - } + // aio.blockOn(store->queryMissing(paths, + // willBuild, willSubstitute, unknown, downloadSize, narSize)); + // + // if (settings.printMissing) { + // aio.blockOn(printMissing( + // ref<Store>(store), willBuild, willSubstitute, unknown, downloadSize, narSize + // )); + // } if (!dryRun) aio.blockOn(store->buildPaths(paths, buildMode, evalStore)); diff --git a/lix/libstore/build/worker.hh b/lix/libstore/build/worker.hh index 7fc3d1fe9..f819d7979 100644 --- a/lix/libstore/build/worker.hh +++ b/lix/libstore/build/worker.hh @@ -207,7 +207,7 @@ public: * Whether to ask the build hook if it can build a derivation. If * it answers with "decline-permanently", we don't try again. */ - bool available = true; + bool available = false; }; HookState hook; diff --git a/lix/libstore/remote-store.cc b/lix/libstore/remote-store.cc index b3c74ab73..edfd52f3d 100644 --- a/lix/libstore/remote-store.cc +++ b/lix/libstore/remote-store.cc @@ -1001,7 +1001,9 @@ try { RemoteStore::Connection::~Connection() { try { + debug("closing remote store connection"); to.flush(); + debug("closed remote store connection"); } catch (...) { ignoreExceptionInDestructor(); } diff --git a/lix/libstore/uds-remote-store.cc b/lix/libstore/uds-remote-store.cc index ca837f524..bf510a747 100644 --- a/lix/libstore/uds-remote-store.cc +++ b/lix/libstore/uds-remote-store.cc @@ -71,6 +71,8 @@ ref<RemoteStore::Connection> UDSRemoteStore::openConnection() conn->startTime = std::chrono::steady_clock::now(); + debug("MEOW open connection"); + return conn; } diff --git a/lix/libutil/monitor-fd.cc b/lix/libutil/monitor-fd.cc index 9f35865e3..bd9d82ecd 100644 --- a/lix/libutil/monitor-fd.cc +++ b/lix/libutil/monitor-fd.cc @@ -1,14 +1,15 @@ #include "monitor-fd.hh" +#include "file-descriptor.hh" namespace nix { -void MonitorFdHup::runThread(int watchFd, int terminateFd) +void MonitorFdHup::runThread(AutoCloseFD watchFd, int terminateFd) { setCurrentThreadName("MonitorFdHup"); while (!quit) { /* Wait indefinitely until a POLLHUP occurs. */ struct pollfd fds[2]; - fds[0].fd = watchFd; + fds[0].fd = watchFd.get(); // There is a POSIX violation on macOS: you have to listen for // at least POLLHUP to receive HUP events for a FD. POSIX says // this is not so, and you should just receive them regardless, @@ -43,6 +44,7 @@ void MonitorFdHup::runThread(int watchFd, int terminateFd) continue; } if (fds[0].revents & POLLHUP) { + std::cerr << "wtf!\n"; callback(); break; } @@ -70,7 +72,11 @@ MonitorFdHup::MonitorFdHup(int fd, std::function<void()> callback) : callback(ca { terminatePipe.create(); int terminateFd = terminatePipe.readSide.get(); - thread = std::thread([this, fd, terminateFd]() { this->runThread(fd, terminateFd); }); + int dupped = dup(fd); + AutoCloseFD watchFd{dupped}; + thread = std::thread([this, myWatchFd = std::move(watchFd), terminateFd]() mutable { + this->runThread(std::move(myWatchFd), terminateFd); + }); }; MonitorFdHup::~MonitorFdHup() diff --git a/lix/libutil/monitor-fd.hh b/lix/libutil/monitor-fd.hh index 97a914ab9..3bf849e0f 100644 --- a/lix/libutil/monitor-fd.hh +++ b/lix/libutil/monitor-fd.hh @@ -28,7 +28,7 @@ private: std::atomic_bool quit = false; std::function<void()> callback; - void runThread(int watchFd, int terminateFd); + void runThread(AutoCloseFD watchFd, int terminateFd); public: MonitorFdHup(int fd, std::function<void()> callback = nix::triggerInterrupt); diff --git a/lix/libutil/thread-pool.cc b/lix/libutil/thread-pool.cc index cdb1e11d6..a0e0f220a 100644 --- a/lix/libutil/thread-pool.cc +++ b/lix/libutil/thread-pool.cc @@ -14,7 +14,7 @@ ThreadPool::ThreadPool(const char * name, size_t _maxThreads) if (!maxThreads) maxThreads = 1; } - debug("starting pool of %d threads", maxThreads); + debug("starting pool %s of %d threads", name, maxThreads); } ThreadPool::~ThreadPool() diff --git a/lix/nix/nar.cc b/lix/nix/nar.cc index 4730b10b1..8203a9442 100644 --- a/lix/nix/nar.cc +++ b/lix/nix/nar.cc @@ -1,8 +1,14 @@ #include "lix/libcmd/command.hh" +#include "lix/libstore/path.hh" +#include "lix/libstore/store-api.hh" +#include "lix/libstore/uds-remote-store.hh" +#include "lix/libutil/async.hh" #include "nar.hh" namespace nix { +void go(); + struct CmdNar : MultiCommand { CmdNar() : MultiCommand(CommandRegistry::getCommandsFor({"nar"})) @@ -22,12 +28,7 @@ struct CmdNar : MultiCommand Category category() override { return catUtility; } - void run() override - { - if (!command) - throw UsageError("'nix nar' requires a sub-command."); - command->second->run(); - } + void run() override; }; void registerNixNar() @@ -35,4 +36,11 @@ void registerNixNar() registerCommand<CmdNar>("nar"); } +void CmdNar::run() +{ + ref<UDSRemoteStore> store = *aio().blockOn(openStore("daemon")).try_cast<UDSRemoteStore>(); + auto pathInfo = aio().blockOn(store->queryPathInfo(StorePath("d0p7kq6l0hi0faf9agcfmm6cq4k1470w-lix-2.93.0-dev-pre20250410-81d4769"))); + __builtin_dump_struct(&pathInfo, printf); +} + } ``` The stuff in nar.cc is unnecessary; I was hoping to repro without needing an evaluator but it turned out that the evaluator isn't that bad since I can do it without nixpkgs. The repro above inconsistently creates the problem. I have dtruss output, but it's not of much help because the kqueue calls aren't dumped usefully. I think the first step I should do dtrace-wise is to dump kqueue calls correctly.
Owner
#!/usr/bin/env -S dtrace -C -s
#pragma D option quiet

#define TARGETED (pid == $target || self->child)

#if defined(STACKS)
#define DUMP_STACK ustack()
#else
#define DUMP_STACK
#endif

/* MacOS X: notice first appearance of child from fork. Its parent
   fires syscall::*fork:return in the ususal way (see below) */
syscall:::entry
/progenyof($target) && self->child == 0/
{
    /* set as child */
    self->child = 1;

    /* print output */
    self->code = errno == 0 ? "" : "Err#";

    printf("fork to %d\t\t = %d %s%d\n", pid,
        0,self->code,(int)errno);
}

/* syscall::rexit:entry */
syscall::exit:entry
/TARGETED/
{
    /* forget child */
    self->child = 0;
    printf("%5d/0x%x:  exit(%d)\n", pid, tid, arg0);
}

#pragma pack(4)

struct kevent {
    uintptr_t       ident;  /* identifier for this event */
    int16_t         filter; /* filter for event */
    uint16_t        flags;  /* general flags */
    uint32_t        fflags; /* filter-specific flags */
    intptr_t        data;   /* filter-specific data */
    void            *udata; /* opaque user data identifier */
};


#pragma pack()

#define print_fds(NUM) \
    syscall::poll:entry, syscall::poll:return \
    /TARGETED && self->remain_fds > NUM/ \
    { \
        this->item = (struct pollfd *)copyin(self->pollfds + NUM * sizeof(struct pollfd), sizeof(struct pollfd)); \
        printf("\t {.fd = %d, .events = 0x%04x, .revents = 0x%04x}\n", this->item->fd, this->item->events, this->item->revents); \
    }

syscall::poll:entry
/TARGETED/
{
    printf("%5d/0x%x:  ",pid,tid);
    printf("poll(%p, %d)\n", arg0, arg1);
    self->remain_fds = arg1;
    self->pollfds = arg0;
    DUMP_STACK;
}

syscall::poll:return
/TARGETED/
{
    // jfc arg0 is the return register... ABI abuse!
    printf("%5d/0x%x:  poll => %d\n", pid, tid, arg0);
}


print_fds(0)
print_fds(1)
print_fds(2)

syscall::kqueue:return
/TARGETED/
{
    printf("%5d/0x%x:  kqueue() => %d\n", pid, tid, arg0);
}

syscall::kevent:entry
/TARGETED/
{
    self->kq = arg0;
    self->changelist = arg1;
    self->nchanges = arg2;
    self->eventlist = arg3;
    self->nevents = arg4;
    printf("%5d/0x%x:  kevent(kq=%d, nevents=%d)\n", pid, tid, self->kq, self->nevents);
    /* DUMP_STACK; */
}


// Copy pasted out of sys/event.h
#define EVFILT_READ             (-1)
#define EVFILT_WRITE            (-2)
#define EVFILT_AIO              (-3)    /* attached to aio requests */
#define EVFILT_VNODE            (-4)    /* attached to vnodes */
#define EVFILT_PROC             (-5)    /* attached to struct proc */
#define EVFILT_SIGNAL           (-6)    /* attached to struct proc */
#define EVFILT_TIMER            (-7)    /* timers */
#define EVFILT_MACHPORT         (-8)    /* Mach portsets */
#define EVFILT_FS               (-9)    /* Filesystem events */
#define EVFILT_USER             (-10)   /* User events */
#define EVFILT_VM               (-12)   /* Virtual memory events */
#define EVFILT_EXCEPT           (-15)   /* Exception events */

#define evfilt_str_case(name) this->kev->filter == name ? (#name) :

#define print_kevent_fds0(PREFIX, ARR, COND, NUM) \
    syscall::kevent:return \
    /TARGETED && (COND)/ \
    { \
        this->kev = (struct kevent *)copyin(ARR + NUM * sizeof(struct kevent), sizeof(struct kevent)); \
        this->filter_s = \
            evfilt_str_case(EVFILT_READ) \
            evfilt_str_case(EVFILT_WRITE) \
            evfilt_str_case(EVFILT_AIO) \
            evfilt_str_case(EVFILT_VNODE) \
            evfilt_str_case(EVFILT_PROC) \
            evfilt_str_case(EVFILT_SIGNAL) \
            evfilt_str_case(EVFILT_TIMER) \
            evfilt_str_case(EVFILT_MACHPORT) \
            evfilt_str_case(EVFILT_FS) \
            evfilt_str_case(EVFILT_USER) \
            evfilt_str_case(EVFILT_VM) \
            evfilt_str_case(EVFILT_EXCEPT) \
            "??"; \
        printf("\t%s{ .ident = %d, .filter = 0x%04x(%s), .flags = 0x%04x, .fflags = 0x%08x, .data = 0x%016x }\n", \
            PREFIX, this->kev->ident, this->kev->filter, this->filter_s, this->kev->flags, this->kev->fflags, this->kev->data \
        ); \
    }
#define print_kevent_fds(NUM) print_kevent_fds0("=> ", self->eventlist, arg0 > NUM, NUM)
#define print_kevent_fds_change(NUM) print_kevent_fds0("CHG ", self->changelist, self->nchanges > NUM, NUM)

syscall::kevent:return
/TARGETED/
{
    printf("%5d/0x%x:  kevent() => %d\n", pid, tid, arg0);
}

print_kevent_fds(0)
print_kevent_fds(1)
print_kevent_fds(2)
print_kevent_fds_change(0)
print_kevent_fds_change(1)
print_kevent_fds_change(2)
fork to 23040            = 0 Err#1
23040/0x28095:  kqueue() => 3
23040/0x28095:  kevent(kq=3, nevents=0)
23040/0x28095:  kevent() => 0
        CHG { .ident = 0, .filter = 0xfff6(EVFILT_USER), .flags = 0x0021, .fflags = 0x00000000, .data = 0x00000000000000
00 }
fork to 23040            = 0 0
fork to 23040            = 0 0
fork to 23040            = 0 0
accepted connection from pid 23048, user jade (trusted)
received daemon op 7
received daemon op 7
received daemon op 9
fork to 23050            = 0 0
23050/0x2816f:  kqueue() => 3
23050/0x2816f:  kevent(kq=3, nevents=0)
23050/0x2816f:  kevent() => 0
        CHG { .ident = 0, .filter = 0xfff6(EVFILT_USER), .flags = 0x0021, .fflags = 0x00000000, .data = 0x00000000000000
00 }
fork to 23050            = 0 0
fork to 23050            = 0 0
23050/0x28171:  poll(16ba5ef08, 2)

              libsystem_kernel.dylib`poll+0x8
              liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1
::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::
function<void ()>)::$_0>>(void*)+0x78
              libsystem_pthread.dylib`_pthread_start+0x88
              libsystem_pthread.dylib`thread_start+0x8
         {.fd = 12, .events = 0x0010, .revents = 0x0000}
         {.fd = 10, .events = 0x0001, .revents = 0x0000}
23050/0x2816f:  kevent(kq=3, nevents=16)
23050/0x2816f:  kevent() => 0
23050/0x2816f:  kevent(kq=3, nevents=16)
23050/0x2816f:  kevent() => 0
23050/0x2816f:  kevent(kq=3, nevents=16)
23050/0x2816f:  kevent() => 0
23050/0x2816f:  kevent(kq=3, nevents=16)
23050/0x2816f:  kevent() => 0
23050/0x2816f:  kevent(kq=3, nevents=16)
4 operations
unexpected Nix daemon error: error: writing to file: Broken pipe
23050/0x2816f:  kevent() => 0
23050/0x28171:  poll => 1
         {.fd = 12, .events = 0x0010, .revents = 0x0000}
         {.fd = 10, .events = 0x0001, .revents = 0x0001}
23050/0x2816f:  exit(1)

This seems to disprove it being kqueue misuse, since the kqueue is not touching that fd. However, it doesn't explain why it is happening. I wonder if we are doing some stupid shit like not CLOEXEC'ing it and having it fall into a child process? Beats me, but it is progress!

```dtrace #!/usr/bin/env -S dtrace -C -s #pragma D option quiet #define TARGETED (pid == $target || self->child) #if defined(STACKS) #define DUMP_STACK ustack() #else #define DUMP_STACK #endif /* MacOS X: notice first appearance of child from fork. Its parent fires syscall::*fork:return in the ususal way (see below) */ syscall:::entry /progenyof($target) && self->child == 0/ { /* set as child */ self->child = 1; /* print output */ self->code = errno == 0 ? "" : "Err#"; printf("fork to %d\t\t = %d %s%d\n", pid, 0,self->code,(int)errno); } /* syscall::rexit:entry */ syscall::exit:entry /TARGETED/ { /* forget child */ self->child = 0; printf("%5d/0x%x: exit(%d)\n", pid, tid, arg0); } #pragma pack(4) struct kevent { uintptr_t ident; /* identifier for this event */ int16_t filter; /* filter for event */ uint16_t flags; /* general flags */ uint32_t fflags; /* filter-specific flags */ intptr_t data; /* filter-specific data */ void *udata; /* opaque user data identifier */ }; #pragma pack() #define print_fds(NUM) \ syscall::poll:entry, syscall::poll:return \ /TARGETED && self->remain_fds > NUM/ \ { \ this->item = (struct pollfd *)copyin(self->pollfds + NUM * sizeof(struct pollfd), sizeof(struct pollfd)); \ printf("\t {.fd = %d, .events = 0x%04x, .revents = 0x%04x}\n", this->item->fd, this->item->events, this->item->revents); \ } syscall::poll:entry /TARGETED/ { printf("%5d/0x%x: ",pid,tid); printf("poll(%p, %d)\n", arg0, arg1); self->remain_fds = arg1; self->pollfds = arg0; DUMP_STACK; } syscall::poll:return /TARGETED/ { // jfc arg0 is the return register... ABI abuse! printf("%5d/0x%x: poll => %d\n", pid, tid, arg0); } print_fds(0) print_fds(1) print_fds(2) syscall::kqueue:return /TARGETED/ { printf("%5d/0x%x: kqueue() => %d\n", pid, tid, arg0); } syscall::kevent:entry /TARGETED/ { self->kq = arg0; self->changelist = arg1; self->nchanges = arg2; self->eventlist = arg3; self->nevents = arg4; printf("%5d/0x%x: kevent(kq=%d, nevents=%d)\n", pid, tid, self->kq, self->nevents); /* DUMP_STACK; */ } // Copy pasted out of sys/event.h #define EVFILT_READ (-1) #define EVFILT_WRITE (-2) #define EVFILT_AIO (-3) /* attached to aio requests */ #define EVFILT_VNODE (-4) /* attached to vnodes */ #define EVFILT_PROC (-5) /* attached to struct proc */ #define EVFILT_SIGNAL (-6) /* attached to struct proc */ #define EVFILT_TIMER (-7) /* timers */ #define EVFILT_MACHPORT (-8) /* Mach portsets */ #define EVFILT_FS (-9) /* Filesystem events */ #define EVFILT_USER (-10) /* User events */ #define EVFILT_VM (-12) /* Virtual memory events */ #define EVFILT_EXCEPT (-15) /* Exception events */ #define evfilt_str_case(name) this->kev->filter == name ? (#name) : #define print_kevent_fds0(PREFIX, ARR, COND, NUM) \ syscall::kevent:return \ /TARGETED && (COND)/ \ { \ this->kev = (struct kevent *)copyin(ARR + NUM * sizeof(struct kevent), sizeof(struct kevent)); \ this->filter_s = \ evfilt_str_case(EVFILT_READ) \ evfilt_str_case(EVFILT_WRITE) \ evfilt_str_case(EVFILT_AIO) \ evfilt_str_case(EVFILT_VNODE) \ evfilt_str_case(EVFILT_PROC) \ evfilt_str_case(EVFILT_SIGNAL) \ evfilt_str_case(EVFILT_TIMER) \ evfilt_str_case(EVFILT_MACHPORT) \ evfilt_str_case(EVFILT_FS) \ evfilt_str_case(EVFILT_USER) \ evfilt_str_case(EVFILT_VM) \ evfilt_str_case(EVFILT_EXCEPT) \ "??"; \ printf("\t%s{ .ident = %d, .filter = 0x%04x(%s), .flags = 0x%04x, .fflags = 0x%08x, .data = 0x%016x }\n", \ PREFIX, this->kev->ident, this->kev->filter, this->filter_s, this->kev->flags, this->kev->fflags, this->kev->data \ ); \ } #define print_kevent_fds(NUM) print_kevent_fds0("=> ", self->eventlist, arg0 > NUM, NUM) #define print_kevent_fds_change(NUM) print_kevent_fds0("CHG ", self->changelist, self->nchanges > NUM, NUM) syscall::kevent:return /TARGETED/ { printf("%5d/0x%x: kevent() => %d\n", pid, tid, arg0); } print_kevent_fds(0) print_kevent_fds(1) print_kevent_fds(2) print_kevent_fds_change(0) print_kevent_fds_change(1) print_kevent_fds_change(2) ``` ``` fork to 23040 = 0 Err#1 23040/0x28095: kqueue() => 3 23040/0x28095: kevent(kq=3, nevents=0) 23040/0x28095: kevent() => 0 CHG { .ident = 0, .filter = 0xfff6(EVFILT_USER), .flags = 0x0021, .fflags = 0x00000000, .data = 0x00000000000000 00 } fork to 23040 = 0 0 fork to 23040 = 0 0 fork to 23040 = 0 0 accepted connection from pid 23048, user jade (trusted) received daemon op 7 received daemon op 7 received daemon op 9 fork to 23050 = 0 0 23050/0x2816f: kqueue() => 3 23050/0x2816f: kevent(kq=3, nevents=0) 23050/0x2816f: kevent() => 0 CHG { .ident = 0, .filter = 0xfff6(EVFILT_USER), .flags = 0x0021, .fflags = 0x00000000, .data = 0x00000000000000 00 } fork to 23050 = 0 0 fork to 23050 = 0 0 23050/0x28171: poll(16ba5ef08, 2) libsystem_kernel.dylib`poll+0x8 liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1 ::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1:: function<void ()>)::$_0>>(void*)+0x78 libsystem_pthread.dylib`_pthread_start+0x88 libsystem_pthread.dylib`thread_start+0x8 {.fd = 12, .events = 0x0010, .revents = 0x0000} {.fd = 10, .events = 0x0001, .revents = 0x0000} 23050/0x2816f: kevent(kq=3, nevents=16) 23050/0x2816f: kevent() => 0 23050/0x2816f: kevent(kq=3, nevents=16) 23050/0x2816f: kevent() => 0 23050/0x2816f: kevent(kq=3, nevents=16) 23050/0x2816f: kevent() => 0 23050/0x2816f: kevent(kq=3, nevents=16) 23050/0x2816f: kevent() => 0 23050/0x2816f: kevent(kq=3, nevents=16) 4 operations unexpected Nix daemon error: error: writing to file: Broken pipe 23050/0x2816f: kevent() => 0 23050/0x28171: poll => 1 {.fd = 12, .events = 0x0010, .revents = 0x0000} {.fd = 10, .events = 0x0001, .revents = 0x0001} 23050/0x2816f: exit(1) ``` This seems to disprove it being kqueue misuse, since the kqueue is not touching that fd. However, it doesn't explain why it *is* happening. I wonder if we are doing some stupid shit like not CLOEXEC'ing it and having it fall into a child process? Beats me, but it is progress!
Owner

The client is undeniably closing it and its side is getting DEALLOC'd. That's not quite the problem.

~/lix » sudo dtrace -s wip.d
dtrace: script 'wip.d' matched 9 probes
CPU     ID                    FUNCTION:NAME
  0 296202              soo_kqfilter:return struct fileproc {
    os_refcnt_t fp_iocount = {
        os_ref_atomic_t ref_count = 0x2
    }
    fileproc_vflags_t fp_vflags = FPV_NONE
    fileproc_flags_t fp_flags = FP_NONE
    uint16_t fp_guard_attrs = 0
    struct fileglob *ptrauth fp_glob = 0x4abafe29d4c22d18
    union  {
        struct select_set *fp_wset = 0
        struct fileproc_guard *ptrauth fp_guard = 0
    }
}
  0 296363             unp_disconnect:entry pid=6533 name=nix
struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1d2c88f8f8
        struct unpcb **le_prev = 0xfffffe1d2c88f9c0
    }
    struct socket *unp_socket = 0xfffffe1d39c5af48
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1d2c88f9c0
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0
    unp_gen_t unp_gencnt = 0x1bdb
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0x54d18 ]
    }
}struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1d3941b830
        struct unpcb **le_prev = 0xfffffe001ae029d0
    }
    struct socket *unp_socket = 0xfffffe1d39c5bad0
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1d3941b830
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0xfffffe1873379a00
    unp_gen_t unp_gencnt = 0x1bdc
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0x1f5
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f,
0x190 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0 ]
    }
}
  0 296363             unp_disconnect:entry struct sockaddr_un {
    unsigned char sun_len = 0x6a
    sa_family_t sun_family = 0x1
    char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ]
}
  0 296363             unp_disconnect:entry
  0 296382                 unp_detach:entry pid=6533
              kernel.release.vmapple`uipc_detach+0x18
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 296065              sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6533 type=1 dealloc=0
  0 296065              sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6533 type=1 dealloc=1

Nobody's calling knote. We're not dispatching SOF_KNOTE (and this is probably fine). We might be dispatching SB_KNOTE, but, hm.

The sb is the peer receiving it right?

  0 296218                   sowakeup:entry pid=6790 sb=0xfffffe1d39c5abe0 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360
              kernel.release.vmapple`write+0x134
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  1 296363             unp_disconnect:entry pid=6788 name=nix
struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1d2c88f8f8
        struct unpcb **le_prev = 0xfffffe1d2c88f9c0
    }
    struct socket *unp_socket = 0xfffffe1d39c5ab70
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1d2c88f9c0
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0
    unp_gen_t unp_gencnt = 0x1d81
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0x54d2d ]
    }
}struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1d3941b830
        struct unpcb **le_prev = 0xfffffe001ae029d0
    }
    struct socket *unp_socket = 0xfffffe1d39c5af48
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1d3941b830
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0xfffffe187337ad80
    unp_gen_t unp_gencnt = 0x1d82
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0x1f5
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f,
0x190 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0 ]
    }
}
  1 296363             unp_disconnect:entry struct sockaddr_un {
    unsigned char sun_len = 0x6a
    sa_family_t sun_family = 0x1
    char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ]
}
  1 296363             unp_disconnect:entry
  1 296218                   sowakeup:entry pid=6788 sb=0xfffffe1d39c5afb8 sb_flags=0x00000144 sb_waiters=0
              kernel.release.vmapple`soisdisconnected+0xe4
              kernel.release.vmapple`uipc_disconnect+0x18
              kernel.release.vmapple`soclose_locked+0x6d4
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  1 296382                 unp_detach:entry pid=6788
  1 296065              sofreelastref:entry free last ref to socket fffffe1d39c5ab70 pid=6788 type=1 dealloc=0
  1 296065              sofreelastref:entry free last ref to socket fffffe1d39c5ab70 pid=6788 type=1 dealloc=1
 10 296382                 unp_detach:entry pid=6790
 10 296065              sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6790 type=1 dealloc=0
 10 296065              sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6790 type=1 dealloc=1

flags=0x144 is SB_UNIX | SB_KNOTE | SB_RECV
flags=0x104 is SB_UNIX | SB_RECV

HEY why is sb_waiters == 0????! OH, no, it's blocking vs knote!! That makes sense, that's fine.
So it's supposed to be delivered as a knote.

HEY WAIT why is sowakeup not getting called on the close sometimes???!

  1 296218                   sowakeup:entry pid=7002 sb=0xfffffe1d39c5afb8 so_flags=0x00008000 sb_flags=0x00000104 sb_waiters=
1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360
              kernel.release.vmapple`write+0x134
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 296363             unp_disconnect:entry pid=7000 name=nix
struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1d2c88f9c0
        struct unpcb **le_prev = 0xfffffe1d2c88fa88
    }
    struct socket *unp_socket = 0xfffffe1d39c5af48
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1d2c88fa88
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0
    unp_gen_t unp_gencnt = 0x1f84
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0x548b7 ]
    }
}struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1d3941b830
        struct unpcb **le_prev = 0xfffffe1d2c88fb50
    }
    struct socket *unp_socket = 0xfffffe1d39c5bea8
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1d3941b830
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0xfffffe1873379580
    unp_gen_t unp_gencnt = 0x1f85
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0x1f5
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f,
0x190 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0 ]
    }
}
  0 296363             unp_disconnect:entry struct sockaddr_un {
    unsigned char sun_len = 0x6a
    sa_family_t sun_family = 0x1
    char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ]
}
  0 296363             unp_disconnect:entry
              kernel.release.vmapple`uipc_disconnect+0x18
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 296382                 unp_detach:entry pid=7000
  0 296065              sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=7000 type=1 dealloc=0
  0 296065              sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=7000 type=1 dealloc=1

That's:

fbt:mach_kernel:soo_kqfilter:entry /curpsinfo->pr_fname == "nix"/ {
    self->go = 1;
    self->kn = (struct knote *)arg1;
    self->fp = (struct fileproc *)arg0;
    // this->sock = (struct socket *)arg0;
    // printf("pid=%d name=%s state=0x%08x filt=%p\n", pid, curpsinfo->pr_fname, this->sock->so_state, this->sock->so_filt);
    // print(*this->sock);
}

fbt:mach_kernel:soo_kqfilter:return /self->go/ {
    self->go = 0;
    print(*self->fp);
}

fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d name=%s\n", pid, curpsinfo->pr_fname);
    self->unpcb = (struct unpcb *)arg0;
    print(*self->unpcb);
    print(*self->unpcb->unp_conn);
    self->go = 1;
}

fbt:mach_kernel:unp_disconnect:entry /self->go && self->unpcb->unp_conn->unp_addr/ {
    print(*self->unpcb->unp_conn->unp_addr);
}

fbt:mach_kernel:unp_disconnect:return /self->go/ {
    self->go = 0;
}

fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { stack(); }
fbt:mach_kernel:unp_detach:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d", pid);
}
fbt:mach_kernel:knote:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d hint=0x%08x", pid, arg1);
}
fbt:mach_kernel:sofreelastref:entry /curpsinfo->pr_fname == "nix"/ {
    self->sock = (struct socket *)arg0;
    printf("free last ref to socket %p pid=%d type=%d dealloc=%d", self->sock, pid, self->sock->so_type, arg1);
}
fbt:mach_kernel:sowakeup:entry /curpsinfo->pr_fname == "nix"/ {
    self->sb = (struct sockbuf *)arg1;
    self->so = (struct socket *)arg0;
    printf("pid=%d sb=0x%08p so_flags=0x%08x sb_flags=0x%08x sb_waiters=%d", pid, self->sb, self->so->so_flags, self->sb->sb_flags, self->sb->sb_waiters);
    stack();
}

I think this is just sketchy stack traces... It doesn't make any sense that the stacks diverge at soclose_locked.

Probably is calling into sodisconnectlocked and then pru_disconnect is uipc_disconnect?

Either way, we have our smoke here. sowakeup isn't getting called when the bad behaviour happens. Now, where's the fire...

The client is undeniably closing it and its side is getting DEALLOC'd. That's not quite the problem. ``` ~/lix » sudo dtrace -s wip.d dtrace: script 'wip.d' matched 9 probes CPU ID FUNCTION:NAME 0 296202 soo_kqfilter:return struct fileproc { os_refcnt_t fp_iocount = { os_ref_atomic_t ref_count = 0x2 } fileproc_vflags_t fp_vflags = FPV_NONE fileproc_flags_t fp_flags = FP_NONE uint16_t fp_guard_attrs = 0 struct fileglob *ptrauth fp_glob = 0x4abafe29d4c22d18 union { struct select_set *fp_wset = 0 struct fileproc_guard *ptrauth fp_guard = 0 } } 0 296363 unp_disconnect:entry pid=6533 name=nix struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1d2c88f8f8 struct unpcb **le_prev = 0xfffffe1d2c88f9c0 } struct socket *unp_socket = 0xfffffe1d39c5af48 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1d2c88f9c0 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0 unp_gen_t unp_gencnt = 0x1bdb int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0x54d18 ] } }struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1d3941b830 struct unpcb **le_prev = 0xfffffe001ae029d0 } struct socket *unp_socket = 0xfffffe1d39c5bad0 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1d3941b830 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0xfffffe1873379a00 unp_gen_t unp_gencnt = 0x1bdc int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0x1f5 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0 ] } } 0 296363 unp_disconnect:entry struct sockaddr_un { unsigned char sun_len = 0x6a sa_family_t sun_family = 0x1 char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ] } 0 296363 unp_disconnect:entry 0 296382 unp_detach:entry pid=6533 kernel.release.vmapple`uipc_detach+0x18 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 296065 sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6533 type=1 dealloc=0 0 296065 sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6533 type=1 dealloc=1 ``` Nobody's calling knote. We're not dispatching SOF_KNOTE (and this is probably fine). We might be dispatching SB_KNOTE, but, hm. The sb is the peer receiving it right? ``` 0 296218 sowakeup:entry pid=6790 sb=0xfffffe1d39c5abe0 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 kernel.release.vmapple`write+0x134 kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 1 296363 unp_disconnect:entry pid=6788 name=nix struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1d2c88f8f8 struct unpcb **le_prev = 0xfffffe1d2c88f9c0 } struct socket *unp_socket = 0xfffffe1d39c5ab70 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1d2c88f9c0 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0 unp_gen_t unp_gencnt = 0x1d81 int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0x54d2d ] } }struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1d3941b830 struct unpcb **le_prev = 0xfffffe001ae029d0 } struct socket *unp_socket = 0xfffffe1d39c5af48 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1d3941b830 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0xfffffe187337ad80 unp_gen_t unp_gencnt = 0x1d82 int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0x1f5 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0 ] } } 1 296363 unp_disconnect:entry struct sockaddr_un { unsigned char sun_len = 0x6a sa_family_t sun_family = 0x1 char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ] } 1 296363 unp_disconnect:entry 1 296218 sowakeup:entry pid=6788 sb=0xfffffe1d39c5afb8 sb_flags=0x00000144 sb_waiters=0 kernel.release.vmapple`soisdisconnected+0xe4 kernel.release.vmapple`uipc_disconnect+0x18 kernel.release.vmapple`soclose_locked+0x6d4 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 1 296382 unp_detach:entry pid=6788 1 296065 sofreelastref:entry free last ref to socket fffffe1d39c5ab70 pid=6788 type=1 dealloc=0 1 296065 sofreelastref:entry free last ref to socket fffffe1d39c5ab70 pid=6788 type=1 dealloc=1 10 296382 unp_detach:entry pid=6790 10 296065 sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6790 type=1 dealloc=0 10 296065 sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=6790 type=1 dealloc=1 ``` flags=0x144 is SB_UNIX | SB_KNOTE | SB_RECV flags=0x104 is SB_UNIX | SB_RECV HEY why is sb_waiters == 0????! OH, no, it's blocking vs knote!! That makes sense, that's fine. So it's supposed to be delivered as a knote. HEY WAIT why is sowakeup not getting called on the close sometimes???! ``` 1 296218 sowakeup:entry pid=7002 sb=0xfffffe1d39c5afb8 so_flags=0x00008000 sb_flags=0x00000104 sb_waiters= 1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 kernel.release.vmapple`write+0x134 kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 296363 unp_disconnect:entry pid=7000 name=nix struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1d2c88f9c0 struct unpcb **le_prev = 0xfffffe1d2c88fa88 } struct socket *unp_socket = 0xfffffe1d39c5af48 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1d2c88fa88 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0 unp_gen_t unp_gencnt = 0x1f84 int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0x548b7 ] } }struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1d3941b830 struct unpcb **le_prev = 0xfffffe1d2c88fb50 } struct socket *unp_socket = 0xfffffe1d39c5bea8 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1d3941b830 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0xfffffe1873379580 unp_gen_t unp_gencnt = 0x1f85 int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0x1f5 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0 ] } } 0 296363 unp_disconnect:entry struct sockaddr_un { unsigned char sun_len = 0x6a sa_family_t sun_family = 0x1 char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ] } 0 296363 unp_disconnect:entry kernel.release.vmapple`uipc_disconnect+0x18 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 296382 unp_detach:entry pid=7000 0 296065 sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=7000 type=1 dealloc=0 0 296065 sofreelastref:entry free last ref to socket fffffe1d39c5af48 pid=7000 type=1 dealloc=1 ``` That's: ```d fbt:mach_kernel:soo_kqfilter:entry /curpsinfo->pr_fname == "nix"/ { self->go = 1; self->kn = (struct knote *)arg1; self->fp = (struct fileproc *)arg0; // this->sock = (struct socket *)arg0; // printf("pid=%d name=%s state=0x%08x filt=%p\n", pid, curpsinfo->pr_fname, this->sock->so_state, this->sock->so_filt); // print(*this->sock); } fbt:mach_kernel:soo_kqfilter:return /self->go/ { self->go = 0; print(*self->fp); } fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d name=%s\n", pid, curpsinfo->pr_fname); self->unpcb = (struct unpcb *)arg0; print(*self->unpcb); print(*self->unpcb->unp_conn); self->go = 1; } fbt:mach_kernel:unp_disconnect:entry /self->go && self->unpcb->unp_conn->unp_addr/ { print(*self->unpcb->unp_conn->unp_addr); } fbt:mach_kernel:unp_disconnect:return /self->go/ { self->go = 0; } fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { stack(); } fbt:mach_kernel:unp_detach:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); } fbt:mach_kernel:knote:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d hint=0x%08x", pid, arg1); } fbt:mach_kernel:sofreelastref:entry /curpsinfo->pr_fname == "nix"/ { self->sock = (struct socket *)arg0; printf("free last ref to socket %p pid=%d type=%d dealloc=%d", self->sock, pid, self->sock->so_type, arg1); } fbt:mach_kernel:sowakeup:entry /curpsinfo->pr_fname == "nix"/ { self->sb = (struct sockbuf *)arg1; self->so = (struct socket *)arg0; printf("pid=%d sb=0x%08p so_flags=0x%08x sb_flags=0x%08x sb_waiters=%d", pid, self->sb, self->so->so_flags, self->sb->sb_flags, self->sb->sb_waiters); stack(); } ``` I think this is just sketchy stack traces... It doesn't make any sense that the stacks diverge at soclose_locked. Probably is calling into sodisconnectlocked and then pru_disconnect is uipc_disconnect? Either way, we have our smoke here. sowakeup isn't getting called when the bad behaviour happens. Now, where's the fire...
Owner

I have an update: namely, that Apple's poll implementation is really busted lmao. What it's doing is that it's receiving a kevent of some kind, which is one-shot, which is NOT the EOF, and then it is unregistering from notifying about the EOF. It is not even delivering even a spurious notification to the nix daemon (which, if it DID do that, it would be fine and would retry!!). Instead it is
not returning from poll. Hilarious!!

This hot mess is what I've been using to diagnose this:

fbt:mach_kernel:soo_kqfilter:entry /curpsinfo->pr_fname == "nix"/ {
    self->go_kqf = 1;
    self->kn = (struct knote *)arg1;
    self->fp = (struct fileproc *)arg0;
	self->sock = (struct socket *)strip((void *)self->fp->fp_glob->fg_data, ptrauth_key_asda);
	printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags,
		self->sock->so_rcv.sb_flags);
	stack();
	ustack();
}

fbt:mach_kernel:soo_kqfilter:return /self->go_kqf/ {
    self->go_kqf = 0;
	printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags,
		self->sock->so_rcv.sb_flags);
    print(*self->fp);
}

fbt:mach_kernel:poll_nocancel:entry /curpsinfo->pr_fname == "nix"/ {
	printf("pid=%d", pid);
	self->go_poll = 1;
	ustack();
}
fbt:mach_kernel:poll_nocancel:return /self->go_poll/ {
	printf("pid=%d", pid);
	self->go_poll = 0;
}

fbt:mach_kernel:filt_sordetach:entry /curpsinfo->pr_fname == "nix"/ {
    self->kn = (struct knote *)arg0;
	self->so = (struct socket *)strip((void *)self->kn->kn_fp->fp_glob->fg_data, ptrauth_key_asda);
	printf("pid=%d kn=%p so=%p rcv_sb_flags=0x%08x", pid, self->kn, self->so, self->so->so_rcv.sb_flags);
	print(*self->kn);
	stack();
	ustack();
}

fbt:mach_kernel:knote_drop:entry /curpsinfo->pr_fname == "nix"/ {
	printf("pid=%d kq=%p kn=%p", pid, arg0, arg1);
	stack();
}

fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d name=%s\n", pid, curpsinfo->pr_fname);
    self->unpcb = (struct unpcb *)arg0;
    print(*self->unpcb);
    print(*self->unpcb->unp_conn);
    self->go = 1;
}

fbt:mach_kernel:unp_disconnect:entry /self->go && self->unpcb->unp_conn->unp_addr/ {
    print(*self->unpcb->unp_conn->unp_addr);
}

fbt:mach_kernel:unp_disconnect:return /self->go/ {
    self->go = 0;
}

fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ {
	this->unp = (struct uncb *)arg0;
    printf("pid=%d", pid);
    stack();
}
fbt:mach_kernel:unp_detach:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d", pid);
    stack();
}
fbt:mach_kernel:knote:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d hint=0x%08x", pid, arg1);
}
fbt:mach_kernel:sofreelastref:entry /curpsinfo->pr_fname == "nix"/ {
    self->sock = (struct socket *)arg0;
    printf("free last ref to socket %p pid=%d type=%d dealloc=%d", self->sock, pid, self->sock->so_type, arg1);
}
fbt:mach_kernel:sowakeup:entry /curpsinfo->pr_fname == "nix"/ {
    self->sb = (struct sockbuf *)arg1;
    self->so = (struct socket *)arg0;
    printf("pid=%d sb=0x%08p so_flags=0x%08x so_state=0x%08x sb_flags=0x%08x sb_waiters=%d", pid, self->sb, self->so->so_flags, self->so->so_state, self->sb->sb_flags, self->sb->sb_waiters);
    stack(4);
}

fbt:mach_kernel:soisdisconnected:entry /curpsinfo->pr_fname == "nix"/ {
	self->so = (struct socket *)arg0;
	printf("pid=%d sock=%p snd_waiters=%d snd_sb_flags=0x%08x rcv_waiters=%d rcv_sb_flags=0x%08x",
		pid,
		arg0,
		self->so->so_snd.sb_waiters,
		self->so->so_snd.sb_flags,
		self->so->so_rcv.sb_waiters,
		self->so->so_rcv.sb_flags
	);
}

fbt:mach_kernel:soclose_locked:entry /curpsinfo->pr_fname == "nix"/ {
    self->go2 = 1;
    self->sock = (struct socket*)arg0;
}
fbt:mach_kernel:sofreelastref:entry /self->go2/ {
    self->go2 = 0;
    printf("sock=%p state=0x%08x flags=0x%08x pid=%d retaincnt=%d usecount=%d", self->sock, self->sock->so_state, self->sock->so_flags, pid, self->sock->so_retaincnt, self->sock->so_usecount);
}

Things that are the case by disconnect time in the fault case:

  3 356920             unp_disconnect:entry pid=7550 name=nix
struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1e4659a560
        struct unpcb **le_prev = 0xfffffe1e4659adf8
    }
    struct socket *unp_socket = 0xfffffe1d43bc95b8
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1e4659adf8
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0
    unp_gen_t unp_gencnt = 0x2aee
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0xa0fa8 ]
    }
}struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1e4659ad30
        struct unpcb **le_prev = 0xfffffe0020e829d0
    }
    struct socket *unp_socket = 0xfffffe1d43ba1990
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1e4659ad30
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0xfffffe230fcc1300
    unp_gen_t unp_gencnt = 0x2aef
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0x1f5
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0 ]
    }
}
  3 356920             unp_disconnect:entry struct sockaddr_un {
    unsigned char sun_len = 0x6a
    sa_family_t sun_family = 0x1
    char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ]
}
  3 356920             unp_disconnect:entry pid=7550
              kernel.release.vmapple`uipc_disconnect+0x18
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  3 356781           soisdisconnected:entry pid=7550 sock=fffffe1d43ba1990 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104
  3 356781           soisdisconnected:entry pid=7550 sock=fffffe1d43bc95b8 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104
  3 356939                 unp_detach:entry pid=7550
              kernel.release.vmapple`uipc_detach+0x18
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`dtrace_systrace_syscall+0x120
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  3 356781           soisdisconnected:entry pid=7550 sock=fffffe1d43bc95b8 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104
  3 356622              sofreelastref:entry free last ref to socket fffffe1d43bc95b8 pid=7550 type=1 dealloc=0
  3 356622              sofreelastref:entry sock=fffffe1d43bc95b8 state=0x00002031 flags=0x00008004 pid=7550 retaincnt=0 usecount=1
  3 356622              sofreelastref:entry free last ref to socket fffffe1d43bc95b8 pid=7550 type=1 dealloc=1

On the client side socket:

  • so_state = 0x2031: SS_ISDISCONNECTED | SS_CANTSENDMORE | SS_CANTRECVMORE | SS_NOFDREF
  • so_flags = 0x8004: SOF_PCBCLEARING | SOF_NODEFUNCT
  • so_snd.sb_flags = 0x100: SB_UNIX
  • so_rcv.sb_flags = 0x104: SB_UNIX | SB_RECV

WELL THERES YOUR PROBLEM! There's not a knote attached to it. How did that happen??

On the good client side socket:

  • so_rcv.sb_flags = 0x144: SB_UNIX | SB_RECV | SB_KNOTE => deliver a knote for "receive" events on the socket

To recap the problem: uipc_disconnect's call tree isn't calling sowakeup from soisdisconnected of the client's socket because daemon_side_sock.so_rcv.sb_flags does not have SB_KNOTE set on it.

Contextual note: uipc_disconnect calls soisdisconnected on the facing socket first: that is, the daemon's socket.

Sidebar on defunct

I think that defunct isn't relevant; it seems to be some kind of idle kill thing?

BUT WHY

Catch a load of this: you call poll() on the daemon and then it unregisters the bloody event without returning anything to userland! lmfao!!!!!

knote->kn_flags = 0x1011: EV_ADD | EV_ONESHOT | EV_FLAG0

Call stack:

poll_nocancel
kqueue_scan
kqueue_process
knote_process

And then it gets dropped here:

8d741a5de7/bsd/kern/kern_event.c (L4448-L4457)

	if (kev.flags & EV_ONESHOT) {
		if ((kn->kn_flags & EV_DISPATCH2) == EV_DISPATCH2 &&
		    (kn->kn_status & KN_DEFERDELETE) == 0) {
			/* defer dropping non-delete oneshot dispatch2 events */
			kn->kn_status |= KN_DEFERDELETE | KN_DISABLED;
		} else {
			drop = true;
		}
	} else if (kn->kn_flags & EV_DISPATCH) {
		/* disable all dispatch knotes */
		kn->kn_status |= KN_DISABLED;
	} else if ((kn->kn_flags & EV_CLEAR) == 0) {
		/* re-activate in case there are more events */
		knote_activate(kq, kn, FILTER_ACTIVE);
	}

	/*
	 * callback to handle each event as we find it.
	 * If we have to detach and drop the knote, do
	 * it while we have the kq unlocked.
	 */
	if (drop) {
		knote_drop(kq, kn, &knlc);
	} else {
		knote_unlock(kq, kn, &knlc, KNOTE_KQ_UNLOCK);
	}

Trace:

CPU     ID                    FUNCTION:NAME
  1 354446                 knote_drop:entry pid=17682 kq=fffffe1d3b00ea58 kn=fffffe1d3b01c600
              kernel.release.vmapple`kevent_register+0xea8
              com.apple.kec.pthread`workq_handle_stack_events+0xa0
              kernel.release.vmapple`workq_kernreturn+0x3d4
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  4 354446                 knote_drop:entry pid=17682 kq=fffffe1d3b00cc70 kn=fffffe1d3b061f80
              kernel.release.vmapple`kevent_register+0xea8
              com.apple.kec.pthread`workq_handle_stack_events+0xa0
              kernel.release.vmapple`workq_kernreturn+0x3d4
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356073              poll_nocancel:entry pid=17684
              libsystem_kernel.dylib`poll+0x8
              liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default
_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::function<void ()>)::$_0>>(void*)+0x78
              libsystem_pthread.dylib`_pthread_start+0x88
              libsystem_pthread.dylib`thread_start+0x8

  1 356758               soo_kqfilter:entry pid=17684 so=fffffe1d43ba3850 so_type=1 so_flags=0x00048000 sb_recv_flags=0x00000104
              kernel.release.vmapple`kevent_register+0xa98
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

              libsystem_kernel.dylib`poll+0x8
              liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default
_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::function<void ()>)::$_0>>(void*)+0x78
              libsystem_pthread.dylib`_pthread_start+0x88
              libsystem_pthread.dylib`thread_start+0x8

  0 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356759              soo_kqfilter:return pid=17684 so=fffffe1d43ba3850 so_type=1 so_flags=0x00048000 sb_recv_flags=0x00000144struct fileproc {
    os_refcnt_t fp_iocount = {
        os_ref_atomic_t ref_count = 0x2
    }
    fileproc_vflags_t fp_vflags = FPV_NONE
    fileproc_flags_t fp_flags = FP_NONE
    uint16_t fp_guard_attrs = 0
    struct fileglob *ptrauth fp_glob = 0x2cd47e2ea6677190
    union  {
        struct select_set *fp_wset = 0
        struct fileproc_guard *ptrauth fp_guard = 0
    }
}
  0 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=0
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 354446                 knote_drop:entry pid=17684 kq=fffffe187a4df880 kn=fffffe1d3b122200
              kernel.release.vmapple`kqueue_scan+0x3c0
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 356634             filt_sordetach:entry pid=17684 kn=fffffe1d3b122200 so=fffffe1d43ba3850 rcv_sb_flags=0x00000144struct knote {
    struct kn_tqe = {
        struct knote *tqe_next = 0
        struct knote **tqe_prev = 0xfffffe187a4df8b8
    }
    struct kn_link = {
        struct knote *sle_next = 0
    }
    struct kn_selnext = {
        struct knote *sle_next = 0
    }
    kn_status_t kn_status = 24
    unsigned long kn_qos_index :4 = 0
    unsigned long kn_qos_override :3 = 0
    unsigned long kn_is_fd :1 = 0x1
    unsigned long kn_vnode_kqok :1 = 0
    unsigned long kn_vnode_use_ofst :1 = 0
    unsigned long kn_kq_packed :42 = 0x2187a4df880
    union  {
        uintptr_t kn_hook = 0
        uint32_t kn_hook32 = 0
    }
    union  {
        struct fileproc *ptrauth kn_fp = 0xf7cd7e29e071a660
        struct proc *ptrauth kn_proc = 0xf7cd7e29e071a660
        struct ipc_port *ptrauth kn_ipc_port = 0xf7cd7e29e071a660
        struct ipc_pset *ptrauth kn_ipc_pset = 0xf7cd7e29e071a660
        struct thread_call *ptrauth kn_thcall = 0xf7cd7e29e071a660
        struct thread *ptrauth kn_thread = 0xf7cd7e29e071a660
    }
    struct kevent_internal_s kn_kevent = {
        uint64_t kei_ident = 0xc
        int8_t kei_filter = '\37777777777'
        uint8_t kei_filtid = 0x17
        uint16_t kei_flags = 0x1011
        int32_t kei_qos = 0
        uint64_t kei_udata = 0
        uint32_t kei_fflags = 0
        uint32_t kei_sfflags = 0
        int64_t kei_sdata = 0
        uint64_t [4] kei_ext = [ 0, 0, 0, 0 ]
    }
}
              kernel.release.vmapple`knote_drop+0x1ec
              kernel.release.vmapple`poll_nocancel+0x2d4
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

              libsystem_kernel.dylib`poll+0x8
              liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default
_delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::function<void ()>)::$_0>>(void*)+0x78
              libsystem_pthread.dylib`_pthread_start+0x88
              libsystem_pthread.dylib`thread_start+0x8

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  2 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  1 356775                   sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 356920             unp_disconnect:entry pid=17682 name=nix
struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1e46599e58
        struct unpcb **le_prev = 0xfffffe1e4659b370
    }
    struct socket *unp_socket = 0xfffffe1d43bb8e08
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1e4659b370
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0
    unp_gen_t unp_gencnt = 0x4aa1
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0xa0d43 ]
    }
}struct unpcb {
    struct unp_link = {
        struct unpcb *le_next = 0xfffffe1e4659b2a8
        struct unpcb **le_prev = 0xfffffe0020e829d0
    }
    struct socket *unp_socket = 0xfffffe1d43ba3850
    struct vnode *unp_vnode = 0
    ino_t unp_ino = 0
    struct unpcb *unp_conn = 0xfffffe1e4659b2a8
    struct unp_head unp_refs = {
        struct unpcb *lh_first = 0
    }
    struct unp_reflink = {
        struct unpcb *le_next = 0
        struct unpcb **le_prev = 0
    }
    struct sockaddr_un *unp_addr = 0xfffffe230fcc2d80
    unp_gen_t unp_gencnt = 0x4aa2
    int unp_cc = 0
    int unp_mbcnt = 0
    uint32_t unp_flags = 0x1
    uint32_t rw_thrcount = 0
    struct xucred unp_peercred = {
        u_int cr_version = 0
        uid_t cr_uid = 0x1f5
        short cr_ngroups = 0x10
        gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ]
    }
    lck_mtx_t unp_mtx = {
        uintptr_t [2] opaque = [ 0xf322000000, 0 ]
    }
}
  0 356920             unp_disconnect:entry struct sockaddr_un {
    unsigned char sun_len = 0x6a
    sa_family_t sun_family = 0x1
    char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ]
}
  0 356920             unp_disconnect:entry pid=17682
              kernel.release.vmapple`uipc_disconnect+0x18
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 356781           soisdisconnected:entry pid=17682 sock=fffffe1d43ba3850 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104
  0 356781           soisdisconnected:entry pid=17682 sock=fffffe1d43bb8e08 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104
  0 356939                 unp_detach:entry pid=17682
              kernel.release.vmapple`uipc_detach+0x18
              kernel.release.vmapple`soclose+0xd4
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

  0 356781           soisdisconnected:entry pid=17682 sock=fffffe1d43bb8e08 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104
  0 356622              sofreelastref:entry free last ref to socket fffffe1d43bb8e08 pid=17682 type=1 dealloc=0
  0 356622              sofreelastref:entry sock=fffffe1d43bb8e08 state=0x00002031 flags=0x00008004 pid=17682 retaincnt=0 usecount=1
  0 356622              sofreelastref:entry free last ref to socket fffffe1d43bb8e08 pid=17682 type=1 dealloc=1
  0 354446                 knote_drop:entry pid=17682 kq=fffffe187a4df900 kn=fffffe1d3aff7800
              kernel.release.vmapple`kqueue_dealloc+0x2f0
              kernel.release.vmapple`fg_drop+0xfc
              kernel.release.vmapple`fp_close_and_unlock+0x8fc
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c
I have an update: namely, that Apple's poll implementation is really busted lmao. What it's doing is that it's receiving a kevent of some kind, which is one-shot, which is NOT the EOF, and then it is unregistering from notifying about the EOF. It is not even delivering even a spurious notification to the nix daemon (which, if it DID do that, it would be fine and would retry!!). Instead it is not returning from poll. Hilarious!! This hot mess is what I've been using to diagnose this: ```d fbt:mach_kernel:soo_kqfilter:entry /curpsinfo->pr_fname == "nix"/ { self->go_kqf = 1; self->kn = (struct knote *)arg1; self->fp = (struct fileproc *)arg0; self->sock = (struct socket *)strip((void *)self->fp->fp_glob->fg_data, ptrauth_key_asda); printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags, self->sock->so_rcv.sb_flags); stack(); ustack(); } fbt:mach_kernel:soo_kqfilter:return /self->go_kqf/ { self->go_kqf = 0; printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags, self->sock->so_rcv.sb_flags); print(*self->fp); } fbt:mach_kernel:poll_nocancel:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); self->go_poll = 1; ustack(); } fbt:mach_kernel:poll_nocancel:return /self->go_poll/ { printf("pid=%d", pid); self->go_poll = 0; } fbt:mach_kernel:filt_sordetach:entry /curpsinfo->pr_fname == "nix"/ { self->kn = (struct knote *)arg0; self->so = (struct socket *)strip((void *)self->kn->kn_fp->fp_glob->fg_data, ptrauth_key_asda); printf("pid=%d kn=%p so=%p rcv_sb_flags=0x%08x", pid, self->kn, self->so, self->so->so_rcv.sb_flags); print(*self->kn); stack(); ustack(); } fbt:mach_kernel:knote_drop:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d kq=%p kn=%p", pid, arg0, arg1); stack(); } fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d name=%s\n", pid, curpsinfo->pr_fname); self->unpcb = (struct unpcb *)arg0; print(*self->unpcb); print(*self->unpcb->unp_conn); self->go = 1; } fbt:mach_kernel:unp_disconnect:entry /self->go && self->unpcb->unp_conn->unp_addr/ { print(*self->unpcb->unp_conn->unp_addr); } fbt:mach_kernel:unp_disconnect:return /self->go/ { self->go = 0; } fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { this->unp = (struct uncb *)arg0; printf("pid=%d", pid); stack(); } fbt:mach_kernel:unp_detach:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); stack(); } fbt:mach_kernel:knote:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d hint=0x%08x", pid, arg1); } fbt:mach_kernel:sofreelastref:entry /curpsinfo->pr_fname == "nix"/ { self->sock = (struct socket *)arg0; printf("free last ref to socket %p pid=%d type=%d dealloc=%d", self->sock, pid, self->sock->so_type, arg1); } fbt:mach_kernel:sowakeup:entry /curpsinfo->pr_fname == "nix"/ { self->sb = (struct sockbuf *)arg1; self->so = (struct socket *)arg0; printf("pid=%d sb=0x%08p so_flags=0x%08x so_state=0x%08x sb_flags=0x%08x sb_waiters=%d", pid, self->sb, self->so->so_flags, self->so->so_state, self->sb->sb_flags, self->sb->sb_waiters); stack(4); } fbt:mach_kernel:soisdisconnected:entry /curpsinfo->pr_fname == "nix"/ { self->so = (struct socket *)arg0; printf("pid=%d sock=%p snd_waiters=%d snd_sb_flags=0x%08x rcv_waiters=%d rcv_sb_flags=0x%08x", pid, arg0, self->so->so_snd.sb_waiters, self->so->so_snd.sb_flags, self->so->so_rcv.sb_waiters, self->so->so_rcv.sb_flags ); } fbt:mach_kernel:soclose_locked:entry /curpsinfo->pr_fname == "nix"/ { self->go2 = 1; self->sock = (struct socket*)arg0; } fbt:mach_kernel:sofreelastref:entry /self->go2/ { self->go2 = 0; printf("sock=%p state=0x%08x flags=0x%08x pid=%d retaincnt=%d usecount=%d", self->sock, self->sock->so_state, self->sock->so_flags, pid, self->sock->so_retaincnt, self->sock->so_usecount); } ``` ## Things that are the case by disconnect time in the fault case: ``` 3 356920 unp_disconnect:entry pid=7550 name=nix struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1e4659a560 struct unpcb **le_prev = 0xfffffe1e4659adf8 } struct socket *unp_socket = 0xfffffe1d43bc95b8 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1e4659adf8 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0 unp_gen_t unp_gencnt = 0x2aee int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0xa0fa8 ] } }struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1e4659ad30 struct unpcb **le_prev = 0xfffffe0020e829d0 } struct socket *unp_socket = 0xfffffe1d43ba1990 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1e4659ad30 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0xfffffe230fcc1300 unp_gen_t unp_gencnt = 0x2aef int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0x1f5 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0 ] } } 3 356920 unp_disconnect:entry struct sockaddr_un { unsigned char sun_len = 0x6a sa_family_t sun_family = 0x1 char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ] } 3 356920 unp_disconnect:entry pid=7550 kernel.release.vmapple`uipc_disconnect+0x18 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 3 356781 soisdisconnected:entry pid=7550 sock=fffffe1d43ba1990 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104 3 356781 soisdisconnected:entry pid=7550 sock=fffffe1d43bc95b8 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104 3 356939 unp_detach:entry pid=7550 kernel.release.vmapple`uipc_detach+0x18 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`dtrace_systrace_syscall+0x120 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 3 356781 soisdisconnected:entry pid=7550 sock=fffffe1d43bc95b8 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104 3 356622 sofreelastref:entry free last ref to socket fffffe1d43bc95b8 pid=7550 type=1 dealloc=0 3 356622 sofreelastref:entry sock=fffffe1d43bc95b8 state=0x00002031 flags=0x00008004 pid=7550 retaincnt=0 usecount=1 3 356622 sofreelastref:entry free last ref to socket fffffe1d43bc95b8 pid=7550 type=1 dealloc=1 ``` On the client side socket: - so_state = 0x2031: SS_ISDISCONNECTED | SS_CANTSENDMORE | SS_CANTRECVMORE | SS_NOFDREF - so_flags = 0x8004: SOF_PCBCLEARING | SOF_NODEFUNCT - so_snd.sb_flags = 0x100: SB_UNIX - so_rcv.sb_flags = 0x104: SB_UNIX | SB_RECV WELL THERES YOUR PROBLEM! There's not a knote attached to it. How did that happen?? On the good client side socket: - so_rcv.sb_flags = 0x144: SB_UNIX | SB_RECV | SB_KNOTE => deliver a knote for "receive" events on the socket To recap the problem: `uipc_disconnect`'s call tree isn't calling `sowakeup` from `soisdisconnected` of the *client*'s socket because daemon_side_sock.so_rcv.sb_flags does not have `SB_KNOTE` set on it. Contextual note: uipc_disconnect calls soisdisconnected on the *facing* socket first: that is, the daemon's socket. ### Sidebar on defunct I think that defunct isn't relevant; it seems to be some kind of idle kill thing? ## BUT WHY Catch a load of this: you call poll() on the daemon and then it unregisters the bloody event without returning anything to userland! lmfao!!!!! `knote->kn_flags = 0x1011: EV_ADD | EV_ONESHOT | EV_FLAG0` Call stack: ``` poll_nocancel kqueue_scan kqueue_process knote_process ``` And then it gets dropped here: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/kern_event.c#L4448-L4457 ```c if (kev.flags & EV_ONESHOT) { if ((kn->kn_flags & EV_DISPATCH2) == EV_DISPATCH2 && (kn->kn_status & KN_DEFERDELETE) == 0) { /* defer dropping non-delete oneshot dispatch2 events */ kn->kn_status |= KN_DEFERDELETE | KN_DISABLED; } else { drop = true; } } else if (kn->kn_flags & EV_DISPATCH) { /* disable all dispatch knotes */ kn->kn_status |= KN_DISABLED; } else if ((kn->kn_flags & EV_CLEAR) == 0) { /* re-activate in case there are more events */ knote_activate(kq, kn, FILTER_ACTIVE); } /* * callback to handle each event as we find it. * If we have to detach and drop the knote, do * it while we have the kq unlocked. */ if (drop) { knote_drop(kq, kn, &knlc); } else { knote_unlock(kq, kn, &knlc, KNOTE_KQ_UNLOCK); } ``` Trace: ``` CPU ID FUNCTION:NAME 1 354446 knote_drop:entry pid=17682 kq=fffffe1d3b00ea58 kn=fffffe1d3b01c600 kernel.release.vmapple`kevent_register+0xea8 com.apple.kec.pthread`workq_handle_stack_events+0xa0 kernel.release.vmapple`workq_kernreturn+0x3d4 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 4 354446 knote_drop:entry pid=17682 kq=fffffe1d3b00cc70 kn=fffffe1d3b061f80 kernel.release.vmapple`kevent_register+0xea8 com.apple.kec.pthread`workq_handle_stack_events+0xa0 kernel.release.vmapple`workq_kernreturn+0x3d4 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356073 poll_nocancel:entry pid=17684 libsystem_kernel.dylib`poll+0x8 liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default _delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::function<void ()>)::$_0>>(void*)+0x78 libsystem_pthread.dylib`_pthread_start+0x88 libsystem_pthread.dylib`thread_start+0x8 1 356758 soo_kqfilter:entry pid=17684 so=fffffe1d43ba3850 so_type=1 so_flags=0x00048000 sb_recv_flags=0x00000104 kernel.release.vmapple`kevent_register+0xa98 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c libsystem_kernel.dylib`poll+0x8 liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default _delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::function<void ()>)::$_0>>(void*)+0x78 libsystem_pthread.dylib`_pthread_start+0x88 libsystem_pthread.dylib`thread_start+0x8 0 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356759 soo_kqfilter:return pid=17684 so=fffffe1d43ba3850 so_type=1 so_flags=0x00048000 sb_recv_flags=0x00000144struct fileproc { os_refcnt_t fp_iocount = { os_ref_atomic_t ref_count = 0x2 } fileproc_vflags_t fp_vflags = FPV_NONE fileproc_flags_t fp_flags = FP_NONE uint16_t fp_guard_attrs = 0 struct fileglob *ptrauth fp_glob = 0x2cd47e2ea6677190 union { struct select_set *fp_wset = 0 struct fileproc_guard *ptrauth fp_guard = 0 } } 0 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=0 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 354446 knote_drop:entry pid=17684 kq=fffffe187a4df880 kn=fffffe1d3b122200 kernel.release.vmapple`kqueue_scan+0x3c0 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 356634 filt_sordetach:entry pid=17684 kn=fffffe1d3b122200 so=fffffe1d43ba3850 rcv_sb_flags=0x00000144struct knote { struct kn_tqe = { struct knote *tqe_next = 0 struct knote **tqe_prev = 0xfffffe187a4df8b8 } struct kn_link = { struct knote *sle_next = 0 } struct kn_selnext = { struct knote *sle_next = 0 } kn_status_t kn_status = 24 unsigned long kn_qos_index :4 = 0 unsigned long kn_qos_override :3 = 0 unsigned long kn_is_fd :1 = 0x1 unsigned long kn_vnode_kqok :1 = 0 unsigned long kn_vnode_use_ofst :1 = 0 unsigned long kn_kq_packed :42 = 0x2187a4df880 union { uintptr_t kn_hook = 0 uint32_t kn_hook32 = 0 } union { struct fileproc *ptrauth kn_fp = 0xf7cd7e29e071a660 struct proc *ptrauth kn_proc = 0xf7cd7e29e071a660 struct ipc_port *ptrauth kn_ipc_port = 0xf7cd7e29e071a660 struct ipc_pset *ptrauth kn_ipc_pset = 0xf7cd7e29e071a660 struct thread_call *ptrauth kn_thcall = 0xf7cd7e29e071a660 struct thread *ptrauth kn_thread = 0xf7cd7e29e071a660 } struct kevent_internal_s kn_kevent = { uint64_t kei_ident = 0xc int8_t kei_filter = '\37777777777' uint8_t kei_filtid = 0x17 uint16_t kei_flags = 0x1011 int32_t kei_qos = 0 uint64_t kei_udata = 0 uint32_t kei_fflags = 0 uint32_t kei_sfflags = 0 int64_t kei_sdata = 0 uint64_t [4] kei_ext = [ 0, 0, 0, 0 ] } } kernel.release.vmapple`knote_drop+0x1ec kernel.release.vmapple`poll_nocancel+0x2d4 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c libsystem_kernel.dylib`poll+0x8 liblixutil.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default _delete<std::__1::__thread_struct>>, nix::MonitorFdHup::MonitorFdHup(int, std::__1::function<void ()>)::$_0>>(void*)+0x78 libsystem_pthread.dylib`_pthread_start+0x88 libsystem_pthread.dylib`thread_start+0x8 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 2 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 1 356775 sowakeup:entry pid=17684 sb=0xfffffe1d43bb8e78 so_flags=0x00008000 so_state=0x00000002 sb_flags=0x00000104 sb_waiters=1 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 356920 unp_disconnect:entry pid=17682 name=nix struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1e46599e58 struct unpcb **le_prev = 0xfffffe1e4659b370 } struct socket *unp_socket = 0xfffffe1d43bb8e08 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1e4659b370 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0 unp_gen_t unp_gencnt = 0x4aa1 int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x8, 0x9, 0xc, 0x14, 0x1d, 0x3d, 0x50, 0x2bd, 0x21, 0x62 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0xa0d43 ] } }struct unpcb { struct unp_link = { struct unpcb *le_next = 0xfffffe1e4659b2a8 struct unpcb **le_prev = 0xfffffe0020e829d0 } struct socket *unp_socket = 0xfffffe1d43ba3850 struct vnode *unp_vnode = 0 ino_t unp_ino = 0 struct unpcb *unp_conn = 0xfffffe1e4659b2a8 struct unp_head unp_refs = { struct unpcb *lh_first = 0 } struct unp_reflink = { struct unpcb *le_next = 0 struct unpcb **le_prev = 0 } struct sockaddr_un *unp_addr = 0xfffffe230fcc2d80 unp_gen_t unp_gencnt = 0x4aa2 int unp_cc = 0 int unp_mbcnt = 0 uint32_t unp_flags = 0x1 uint32_t rw_thrcount = 0 struct xucred unp_peercred = { u_int cr_version = 0 uid_t cr_uid = 0x1f5 short cr_ngroups = 0x10 gid_t [16] cr_groups = [ 0x14, 0xc, 0x3d, 0x4f, 0x50, 0x51, 0x2bd, 0x21, 0x62, 0x64, 0xcc, 0xfa, 0x18b, 0x18e, 0x18f, 0x190 ] } lck_mtx_t unp_mtx = { uintptr_t [2] opaque = [ 0xf322000000, 0 ] } } 0 356920 unp_disconnect:entry struct sockaddr_un { unsigned char sun_len = 0x6a sa_family_t sun_family = 0x1 char [104] sun_path = [ "/nix/var/nix/daemon-socket/socket" ] } 0 356920 unp_disconnect:entry pid=17682 kernel.release.vmapple`uipc_disconnect+0x18 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 356781 soisdisconnected:entry pid=17682 sock=fffffe1d43ba3850 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104 0 356781 soisdisconnected:entry pid=17682 sock=fffffe1d43bb8e08 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104 0 356939 unp_detach:entry pid=17682 kernel.release.vmapple`uipc_detach+0x18 kernel.release.vmapple`soclose+0xd4 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c 0 356781 soisdisconnected:entry pid=17682 sock=fffffe1d43bb8e08 snd_waiters=0 snd_sb_flags=0x00000100 rcv_waiters=0 rcv_sb_flags=0x00000104 0 356622 sofreelastref:entry free last ref to socket fffffe1d43bb8e08 pid=17682 type=1 dealloc=0 0 356622 sofreelastref:entry sock=fffffe1d43bb8e08 state=0x00002031 flags=0x00008004 pid=17682 retaincnt=0 usecount=1 0 356622 sofreelastref:entry free last ref to socket fffffe1d43bb8e08 pid=17682 type=1 dealloc=1 0 354446 knote_drop:entry pid=17682 kq=fffffe187a4df900 kn=fffffe1d3aff7800 kernel.release.vmapple`kqueue_dealloc+0x2f0 kernel.release.vmapple`fg_drop+0xfc kernel.release.vmapple`fp_close_and_unlock+0x8fc kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c ```
Owner

I think that my advice for us is that we write a native kqueue based implementation of MonitorFdHup because Apple apparently did not write a poll implementation that doesn't deadlock, and it appears that the problem is in poll receiving spurious events internally and then turning itself off. Fortunately writing such a kqueue usage isn't exactly hard.

The other option is that we add enough spurious things to the poll event description that it delivers the events anyway.

While writing this comment I realized what the actual problem is: if the daemon isn't blockingly waiting on the socket, the kqueue of poll gets notified about it becoming readable:

sb_waiters == 0 (ie no blocking waiting) always precedes an instance of non-delivered EOF event.

  0 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=0
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 354446                 knote_drop:entry pid=17684 kq=fffffe187a4df880 kn=fffffe1d3b122200
              kernel.release.vmapple`kqueue_scan+0x3c0
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

Indeed poll(2) is using EVFILT_READ:

		/* Handle input events */
		if (events & (POLLIN | POLLRDNORM | POLLPRI | POLLRDBAND | POLLHUP)) {
			kev.filter = EVFILT_READ;
			if (events & (POLLPRI | POLLRDBAND)) {
				kev.flags |= EV_OOBAND;
			}
			rc = kevent_register(kq, &kev, NULL);
			assert((rc & FILTER_REGISTER_WAIT) == 0);
		}

However, it seems like kqueue_scan is not returning inside of poll either, which is .. more baffling. Clearly it got the event and deleted it but it still didn't return?

Diagnosing this bug was fun.

I think that my advice for us is that we write a native kqueue based implementation of MonitorFdHup because Apple apparently did not write a poll implementation that doesn't deadlock, and it appears that the problem is in poll receiving spurious events internally and then turning itself off. Fortunately writing such a kqueue usage isn't exactly hard. The other option is that we add enough spurious things to the poll event description that it delivers the events anyway. While writing this comment I realized what the *actual* problem is: if the daemon isn't blockingly waiting on the socket, the kqueue of `poll` gets notified about it becoming readable: sb_waiters == 0 (ie no blocking waiting) always precedes an instance of non-delivered EOF event. ``` 0 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=0 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 354446 knote_drop:entry pid=17684 kq=fffffe187a4df880 kn=fffffe1d3b122200 kernel.release.vmapple`kqueue_scan+0x3c0 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c ``` Indeed poll(2) is [using EVFILT_READ](https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/sys_generic.c#L1749-L1757): ```c /* Handle input events */ if (events & (POLLIN | POLLRDNORM | POLLPRI | POLLRDBAND | POLLHUP)) { kev.filter = EVFILT_READ; if (events & (POLLPRI | POLLRDBAND)) { kev.flags |= EV_OOBAND; } rc = kevent_register(kq, &kev, NULL); assert((rc & FILTER_REGISTER_WAIT) == 0); } ``` However, it seems like kqueue_scan is not returning inside of poll either, which is .. more baffling. Clearly it got the event and deleted it but it still didn't return? Diagnosing this bug was fun.
Owner

Aha! kec_process_noutputs is not getting incremented because revents remains zero due to the lack of matching events being requested and thus kqueue_process is not returning early as intended, I believe!

8d741a5de7/bsd/kern/kern_event.c (L7870-L7873)

That is:

  • poll is asking the kernel event subsystem for any read-relevant events
  • the kernel event subsystem is telling poll "hey ur socket is readable bestie"
  • poll is consulting the things it was supposed to notify about in its callback and finding that it doesn't need to tell anyone about the socket being readable, tells the kernel event subsystem "thanks but no thanks bestie" and doesn't mark the fd as having received an event
  • the kernel event subsystem then proceeds to deadlock the poll by deleting the only event we care about while not re-adding the oneshot event it just deleted since it could happen again. This is a kernel bug in macOS.

The poll_callback is called here:

8d741a5de7/bsd/kern/kern_event.c (L4465-L4467)

Okay, so, yeah. The Apple poll implementation is broken. We could maybe bisect why it wasn't happening on Lix 2.91; I have two hypotheses:

  • It used to be sketchy reliability wise and we didn't notice
  • Maybe adding the second FD to poll made it think that it wasn't deadlocked and thus didn't force a return and re-run through the loop?

The other alternative to writing a correct implementation based on kqueue directly is that we set a poll timeout of the maximum time that we tolerate stale daemons lying around, since I think that the timeout will still probably be respected, and poll should be fine being level-triggered.

Aha! `kec_process_noutputs` is not getting incremented because `revents` remains zero due to the lack of matching `events` being requested and thus `kqueue_process` is not returning early as intended, I believe! https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/kern_event.c#L7870-L7873 That is: - poll is asking the kernel event subsystem for any read-relevant events - the kernel event subsystem is telling poll "hey ur socket is readable bestie" - poll is consulting the things it was supposed to notify about in its callback and finding that it doesn't need to tell anyone about the socket being readable, tells the kernel event subsystem "thanks but no thanks bestie" and doesn't mark the fd as having received an event - the kernel event subsystem then proceeds to deadlock the poll by deleting the only event we care about while not re-adding the oneshot event it just deleted since it could happen again. ***This is a kernel bug in macOS***. The `poll_callback` is called here: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/kern_event.c#L4465-L4467 Okay, so, yeah. The Apple poll implementation is broken. We could maybe bisect why it wasn't happening on Lix 2.91; I have two hypotheses: - It used to be sketchy reliability wise and we didn't notice - Maybe adding the second FD to poll made it think that it *wasn't* deadlocked and thus didn't force a return and re-run through the loop? The other alternative to writing a correct implementation based on kqueue directly is that we set a poll timeout of the maximum time that we tolerate stale daemons lying around, since I think that the timeout will still probably be respected, and poll *should* be *fine* being level-triggered.
Owner

https://openradar.appspot.com/FB17447257

Copy pasted report

poll(2) fails to deliver HUP events when socket becomes readable before HUP, with the event filter POLLHUP

First, let's start out with a POSIX violation: you are not supposed to require POLLHUP to deliver HUP events, but meh, XNU does a POSIX violation, sure, that's rdar://37537852.
Not this bug.

Our upstream bug URL with lots of kernel debugging details: #729 (comment)

This bug was found in Lix, which has a thread listening for HUP events on client connections and a main thread performing blocking IO on the client connection.
The HUP events thread calls poll(2) on two FDs, the client connection one with events = POLLHUP, and a terminate wakeup pipe with events = POLLIN.

The terminate wakeup pipe is used to knock the HUP events thread out of sleep when the main thread wants to terminate it.

We observed that if the following situation happens, the kernel loses the HUP event:

  • poll(2) is called with the following:
    • One socket with events = POLLHUP
    • One pipe with events = POLLIN
  • The socket receives data which is not immediately received by a blocking read.
  • The socket hangs up.

Then the call to poll(2) does not terminate.

Expected behaviour:
poll(2) returns when an FD with events=POLLHUP receives a HUP event.

Actual behaviour:
poll(2) does not return.

Kernel bug diagnosis:

This is because the XNU kernel translates poll(2) with events = POLLHUP to kqueue with a filter of EVFILT_READ and using the flag EV_ONESHOT: 8d741a5de7/bsd/kern/sys_generic.c (L1749-L1757)
The kernel kevent poll_callback receives the event, changes no flags since it's not an EOF so it doesn't match the poll filter, then doesn't increment kec_process_noutputs.

Since it's EV_ONESHOT, the event subscription is deleted so no further events arrive on that fd: 8d741a5de7/bsd/kern/kern_event.c (L7870-L7873)

Incrementing kec_process_noutputs is required to cause the call to terminate: 8d741a5de7/bsd/kern/kern_event.c (L7870-L7873)
This also only gets stuck if you have two fds going into poll, since I am guessing that listening on no kevents causes waiting for kevents to terminate.

That is:

  • poll is asking the kernel event subsystem for any read-relevant events
  • The kernel event subsystem is telling poll "hey ur socket is readable bestie"
  • poll is consulting the things it was supposed to notify about in its callback and finding that it doesn't need to tell anyone about the socket being readable, tells the kernel event subsystem "thanks but no thanks bestie" and doesn't mark the fd as having received an event
  • The kernel event subsystem then proceeds to deadlock the poll by deleting the only event we care about while not re-adding the oneshot event it just deleted since it could happen again.
    This is a bug.

I have attached the janky dtrace script I used to diagnose this (n.b. requires process name == nix) as well as a reproducer that will terminate when built with Clang on Linux, but will block forever on macOS.

Build the reproducer with clang++ test.cc -o test.

linux$ ./test
close
poll loop
pollfd {
  int fd = 3
  short events = 16
  short revents = 16
}
pollfd {
  int fd = 5
  short events = 1
  short revents = 0
}
POLLHUP
finish poll
macos$ ./test
close
<hangs forever>
^C

This is the symptom as observed by the dtrace script wip.d that led me to realize what the problem was:

sb_waiters == 0 (ie no blocking waiting) always precedes an instance of non-delivered EOF event.

  0 356775                   sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=0
              kernel.release.vmapple`uipc_send+0x8b8
              kernel.release.vmapple`soo_write+0xd8
              kernel.release.vmapple`fp_writev+0x90
              kernel.release.vmapple`writev_internal+0x360

  0 354446                 knote_drop:entry pid=17684 kq=fffffe187a4df880 kn=fffffe1d3b122200
              kernel.release.vmapple`kqueue_scan+0x3c0
              kernel.release.vmapple`unix_syscall+0x304
              kernel.release.vmapple`sleh_synchronous+0x3e0
              kernel.release.vmapple`fleh_synchronous+0x28
              kernel.release.vmapple`fleh_dispatch64+0x19c

test.cc

#include <iostream>
#include <sys/socket.h>
#include <thread>
#include <sys/poll.h>
#include <sys/types.h>
#include <sys/time.h>
#include <unistd.h>

void thr2_go(int fd, int fd2)
{
    struct pollfd fds[2];
    // If this is set to POLLIN, the bug does not happen, since the kernel
    // doesn't eat the event.
    fds[0].events = POLLHUP;
    fds[0].fd = fd;
    fds[1].events = POLLIN;
    fds[1].fd = fd2;
    while (true) {
        int count = poll(fds, 2, -1);
        if (count == -1) {
            perror("poll");
            return;
        }
        std::cout << "poll loop\n";
        __builtin_dump_struct(&fds[0], fprintf, stderr);
        __builtin_dump_struct(&fds[1], fprintf, stderr);

        // if (fds[0].revents & POLLIN) {
        //     char trash[10];
        //     read(fds[1].fd, trash, 10);
        // }
        if (fds[0].revents & POLLHUP) {
            std::cout << "POLLHUP\n";
            break;
        }
    }
    std::cout << "finish poll\n";
}

int main(void)
{
    int sock_fds[2];
    int pipe_fds[2];
    int rv = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds);
    if (rv < 0) {
        perror("socketpair");
        return 1;
    }
    rv = pipe(pipe_fds);
    if (rv < 0) {
        perror("pipe");
        return 1;
    }
    std::thread thr2{[&sock_fds, &pipe_fds]() { thr2_go(sock_fds[0], pipe_fds[0]); }};

    sleep(1);
    if (write(sock_fds[1], "a", 1) < 0) {
        perror("write");
    }
    sleep(1);
    std::cout << "close\n";
    close(sock_fds[1]);
    thr2.join();
}

wip.d

fbt:mach_kernel:soo_kqfilter:entry /curpsinfo->pr_fname == "nix"/ {
    self->go_kqf = 1;
    self->kn = (struct knote *)arg1;
    self->fp = (struct fileproc *)arg0;
	self->sock = (struct socket *)strip((void *)self->fp->fp_glob->fg_data, ptrauth_key_asda);
	printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags,
		self->sock->so_rcv.sb_flags);
	stack();
	ustack();
}

fbt:mach_kernel:soo_kqfilter:return /self->go_kqf/ {
    self->go_kqf = 0;
	printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags,
		self->sock->so_rcv.sb_flags);
    print(*self->fp);
}

fbt:mach_kernel:poll_nocancel:entry /curpsinfo->pr_fname == "nix"/ {
	printf("pid=%d", pid);
    self->go_poll = 1;
	ustack();
}
fbt:mach_kernel:poll_nocancel:return /curpsinfo->pr_fname == "nix"/ {
	printf("pid=%d", pid);
}

fbt:mach_kernel:kqueue_scan:entry /self->go_poll/ {
	printf("pid=%d", pid);
    self->go_poll = 0;
	ustack();
}
fbt:mach_kernel:kqueue_scan:return /curpsinfo->pr_fname == "nix"/ {
	printf("pid=%d", pid);
}

fbt:mach_kernel:filt_sordetach:entry /curpsinfo->pr_fname == "nix"/ {
    self->kn = (struct knote *)arg0;
	self->so = (struct socket *)strip((void *)self->kn->kn_fp->fp_glob->fg_data, ptrauth_key_asda);
	printf("pid=%d kn=%p so=%p rcv_sb_flags=0x%08x", pid, self->kn, self->so, self->so->so_rcv.sb_flags);
	print(*self->kn);
	stack();
	ustack();
}

fbt:mach_kernel:knote_drop:entry /curpsinfo->pr_fname == "nix"/ {
	printf("pid=%d kq=%p kn=%p", pid, arg0, arg1);
	stack();
}

fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d name=%s\n", pid, curpsinfo->pr_fname);
    self->unpcb = (struct unpcb *)arg0;
    print(*self->unpcb);
    print(*self->unpcb->unp_conn);
    self->go = 1;
}

fbt:mach_kernel:unp_disconnect:entry /self->go && self->unpcb->unp_conn->unp_addr/ {
    print(*self->unpcb->unp_conn->unp_addr);
}

fbt:mach_kernel:unp_disconnect:return /self->go/ {
    self->go = 0;
}

fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ {
	this->unp = (struct uncb *)arg0;
    printf("pid=%d", pid);
    stack();
}
fbt:mach_kernel:unp_detach:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d", pid);
    stack();
}
fbt:mach_kernel:knote:entry /curpsinfo->pr_fname == "nix"/ {
    printf("pid=%d hint=0x%08x", pid, arg1);
}
fbt:mach_kernel:sofreelastref:entry /curpsinfo->pr_fname == "nix"/ {
    self->sock = (struct socket *)arg0;
    printf("free last ref to socket %p pid=%d type=%d dealloc=%d", self->sock, pid, self->sock->so_type, arg1);
}
fbt:mach_kernel:sowakeup:entry /curpsinfo->pr_fname == "nix"/ {
    self->sb = (struct sockbuf *)arg1;
    self->so = (struct socket *)arg0;
    printf("pid=%d sb=0x%08p so_flags=0x%08x so_state=0x%08x sb_flags=0x%08x sb_waiters=%d", pid, self->sb, self->so->so_flags, self->so->so_state, self->sb->sb_flags, self->sb->sb_waiters);
    stack(4);
}

fbt:mach_kernel:soisdisconnected:entry /curpsinfo->pr_fname == "nix"/ {
	self->so = (struct socket *)arg0;
	printf("pid=%d sock=%p snd_waiters=%d snd_sb_flags=0x%08x rcv_waiters=%d rcv_sb_flags=0x%08x",
		pid,
		arg0,
		self->so->so_snd.sb_waiters,
		self->so->so_snd.sb_flags,
		self->so->so_rcv.sb_waiters,
		self->so->so_rcv.sb_flags
	);
}

fbt:mach_kernel:soclose_locked:entry /curpsinfo->pr_fname == "nix"/ {
    self->go2 = 1;
    self->sock = (struct socket*)arg0;
}
fbt:mach_kernel:sofreelastref:entry /self->go2/ {
    self->go2 = 0;
    printf("sock=%p state=0x%08x flags=0x%08x pid=%d retaincnt=%d usecount=%d", self->sock, self->sock->so_state, self->sock->so_flags, pid, self->sock->so_retaincnt, self->sock->so_usecount);
}
https://openradar.appspot.com/FB17447257 <details> <summary>Copy pasted report</summary> ## poll(2) fails to deliver HUP events when socket becomes readable before HUP, with the event filter POLLHUP First, let's start out with a POSIX violation: you are not supposed to require POLLHUP to deliver HUP events, but meh, XNU does a POSIX violation, sure, that's rdar://37537852. Not this bug. Our upstream bug URL with lots of kernel debugging details: https://git.lix.systems/lix-project/lix/issues/729#issuecomment-10453 This bug was found in Lix, which has a thread listening for HUP events on client connections and a main thread performing blocking IO on the client connection. The HUP events thread calls poll(2) on two FDs, the client connection one with events = POLLHUP, and a terminate wakeup pipe with events = POLLIN. The terminate wakeup pipe is used to knock the HUP events thread out of sleep when the main thread wants to terminate it. We observed that if the following situation happens, the kernel loses the HUP event: - poll(2) is called with the following: - One socket with events = POLLHUP - One pipe with events = POLLIN - The socket receives data which is not immediately received by a blocking read. - The socket hangs up. Then the call to poll(2) does not terminate. Expected behaviour: poll(2) returns when an FD with events=POLLHUP receives a HUP event. Actual behaviour: poll(2) does not return. Kernel bug diagnosis: This is because the XNU kernel translates poll(2) with events = POLLHUP to kqueue with a filter of EVFILT_READ and using the flag EV_ONESHOT: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/sys_generic.c#L1749-L1757 The kernel kevent `poll_callback` receives the event, changes no flags since it's not an EOF so it doesn't match the poll filter, then doesn't increment kec_process_noutputs. Since it's EV_ONESHOT, the event subscription is deleted so no further events arrive on that fd: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/kern_event.c#L7870-L7873 Incrementing kec_process_noutputs is required to cause the call to terminate: https://github.com/apple-oss-distributions/xnu/blob/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/kern/kern_event.c#L7870-L7873 This also only gets stuck if you have two fds going into poll, since I am guessing that listening on no kevents causes waiting for kevents to terminate. That is: * poll is asking the kernel event subsystem for any read-relevant events * The kernel event subsystem is telling poll "hey ur socket is readable bestie" * poll is consulting the things it was supposed to notify about in its callback and finding that it doesn't need to tell anyone about the socket being readable, tells the kernel event subsystem "thanks but no thanks bestie" and doesn't mark the fd as having received an event * The kernel event subsystem then proceeds to deadlock the poll by deleting the only event we care about while not re-adding the oneshot event it just deleted since it could happen again. This is a bug. I have attached the janky dtrace script I used to diagnose this (n.b. requires process name == `nix`) as well as a reproducer that will terminate when built with Clang on Linux, but will block forever on macOS. Build the reproducer with `clang++ test.cc -o test`. ``` linux$ ./test close poll loop pollfd { int fd = 3 short events = 16 short revents = 16 } pollfd { int fd = 5 short events = 1 short revents = 0 } POLLHUP finish poll ``` ``` macos$ ./test close <hangs forever> ^C ``` --- This is the symptom as observed by the dtrace script `wip.d` that led me to realize what the problem was: sb_waiters == 0 (ie no blocking waiting) always precedes an instance of non-delivered EOF event. ``` 0 356775 sowakeup:entry pid=17682 sb=0xfffffe1d43ba38c0 so_flags=0x00048000 so_state=0x00000082 sb_flags=0x00000144 sb_waiters=0 kernel.release.vmapple`uipc_send+0x8b8 kernel.release.vmapple`soo_write+0xd8 kernel.release.vmapple`fp_writev+0x90 kernel.release.vmapple`writev_internal+0x360 0 354446 knote_drop:entry pid=17684 kq=fffffe187a4df880 kn=fffffe1d3b122200 kernel.release.vmapple`kqueue_scan+0x3c0 kernel.release.vmapple`unix_syscall+0x304 kernel.release.vmapple`sleh_synchronous+0x3e0 kernel.release.vmapple`fleh_synchronous+0x28 kernel.release.vmapple`fleh_dispatch64+0x19c ``` ## test.cc ```cpp #include <iostream> #include <sys/socket.h> #include <thread> #include <sys/poll.h> #include <sys/types.h> #include <sys/time.h> #include <unistd.h> void thr2_go(int fd, int fd2) { struct pollfd fds[2]; // If this is set to POLLIN, the bug does not happen, since the kernel // doesn't eat the event. fds[0].events = POLLHUP; fds[0].fd = fd; fds[1].events = POLLIN; fds[1].fd = fd2; while (true) { int count = poll(fds, 2, -1); if (count == -1) { perror("poll"); return; } std::cout << "poll loop\n"; __builtin_dump_struct(&fds[0], fprintf, stderr); __builtin_dump_struct(&fds[1], fprintf, stderr); // if (fds[0].revents & POLLIN) { // char trash[10]; // read(fds[1].fd, trash, 10); // } if (fds[0].revents & POLLHUP) { std::cout << "POLLHUP\n"; break; } } std::cout << "finish poll\n"; } int main(void) { int sock_fds[2]; int pipe_fds[2]; int rv = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds); if (rv < 0) { perror("socketpair"); return 1; } rv = pipe(pipe_fds); if (rv < 0) { perror("pipe"); return 1; } std::thread thr2{[&sock_fds, &pipe_fds]() { thr2_go(sock_fds[0], pipe_fds[0]); }}; sleep(1); if (write(sock_fds[1], "a", 1) < 0) { perror("write"); } sleep(1); std::cout << "close\n"; close(sock_fds[1]); thr2.join(); } ``` ## wip.d ```d fbt:mach_kernel:soo_kqfilter:entry /curpsinfo->pr_fname == "nix"/ { self->go_kqf = 1; self->kn = (struct knote *)arg1; self->fp = (struct fileproc *)arg0; self->sock = (struct socket *)strip((void *)self->fp->fp_glob->fg_data, ptrauth_key_asda); printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags, self->sock->so_rcv.sb_flags); stack(); ustack(); } fbt:mach_kernel:soo_kqfilter:return /self->go_kqf/ { self->go_kqf = 0; printf("pid=%d so=%p so_type=%d so_flags=0x%08x sb_recv_flags=0x%08x", pid, self->sock, self->sock->so_type, self->sock->so_flags, self->sock->so_rcv.sb_flags); print(*self->fp); } fbt:mach_kernel:poll_nocancel:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); self->go_poll = 1; ustack(); } fbt:mach_kernel:poll_nocancel:return /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); } fbt:mach_kernel:kqueue_scan:entry /self->go_poll/ { printf("pid=%d", pid); self->go_poll = 0; ustack(); } fbt:mach_kernel:kqueue_scan:return /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); } fbt:mach_kernel:filt_sordetach:entry /curpsinfo->pr_fname == "nix"/ { self->kn = (struct knote *)arg0; self->so = (struct socket *)strip((void *)self->kn->kn_fp->fp_glob->fg_data, ptrauth_key_asda); printf("pid=%d kn=%p so=%p rcv_sb_flags=0x%08x", pid, self->kn, self->so, self->so->so_rcv.sb_flags); print(*self->kn); stack(); ustack(); } fbt:mach_kernel:knote_drop:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d kq=%p kn=%p", pid, arg0, arg1); stack(); } fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d name=%s\n", pid, curpsinfo->pr_fname); self->unpcb = (struct unpcb *)arg0; print(*self->unpcb); print(*self->unpcb->unp_conn); self->go = 1; } fbt:mach_kernel:unp_disconnect:entry /self->go && self->unpcb->unp_conn->unp_addr/ { print(*self->unpcb->unp_conn->unp_addr); } fbt:mach_kernel:unp_disconnect:return /self->go/ { self->go = 0; } fbt:mach_kernel:unp_disconnect:entry /curpsinfo->pr_fname == "nix"/ { this->unp = (struct uncb *)arg0; printf("pid=%d", pid); stack(); } fbt:mach_kernel:unp_detach:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d", pid); stack(); } fbt:mach_kernel:knote:entry /curpsinfo->pr_fname == "nix"/ { printf("pid=%d hint=0x%08x", pid, arg1); } fbt:mach_kernel:sofreelastref:entry /curpsinfo->pr_fname == "nix"/ { self->sock = (struct socket *)arg0; printf("free last ref to socket %p pid=%d type=%d dealloc=%d", self->sock, pid, self->sock->so_type, arg1); } fbt:mach_kernel:sowakeup:entry /curpsinfo->pr_fname == "nix"/ { self->sb = (struct sockbuf *)arg1; self->so = (struct socket *)arg0; printf("pid=%d sb=0x%08p so_flags=0x%08x so_state=0x%08x sb_flags=0x%08x sb_waiters=%d", pid, self->sb, self->so->so_flags, self->so->so_state, self->sb->sb_flags, self->sb->sb_waiters); stack(4); } fbt:mach_kernel:soisdisconnected:entry /curpsinfo->pr_fname == "nix"/ { self->so = (struct socket *)arg0; printf("pid=%d sock=%p snd_waiters=%d snd_sb_flags=0x%08x rcv_waiters=%d rcv_sb_flags=0x%08x", pid, arg0, self->so->so_snd.sb_waiters, self->so->so_snd.sb_flags, self->so->so_rcv.sb_waiters, self->so->so_rcv.sb_flags ); } fbt:mach_kernel:soclose_locked:entry /curpsinfo->pr_fname == "nix"/ { self->go2 = 1; self->sock = (struct socket*)arg0; } fbt:mach_kernel:sofreelastref:entry /self->go2/ { self->go2 = 0; printf("sock=%p state=0x%08x flags=0x%08x pid=%d retaincnt=%d usecount=%d", self->sock, self->sock->so_state, self->sock->so_flags, pid, self->sock->so_retaincnt, self->sock->so_usecount); } ``` </details>
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#729
No description provided.