Handle SSH connection drops with remote builders more gracefully #185

Open
opened 2024-03-25 18:37:10 +00:00 by mweinelt · 0 comments
Member

Describe the bug

A remote build suddenly errored out with

error: builder for '/nix/store/58jxr2x94kz4zq5x5gqffjsyz7kpxhgc-cargo-c-0.9.29.drv' failed with exit code 1
error: Nix daemon disconnected unexpectedly (maybe it crashed?)

On the remote builder (nix 2.18.2) I can find

Mar 25 18:20:04 build2 nix-daemon[1917]: accepted connection from pid 880823, user hexa (trusted)
Mar 25 18:21:43 build2 nix-daemon[880825]: terminate called after throwing an instance of 'nix::Interrupted'
Mar 25 18:21:43 build2 nix-daemon[880825]:   what():  error: interrupted by the user

A coredump was created due to SIGABRT

Call trace
#0  0x00007fb74c3e107c in __pthread_kill_implementation () from /nix/store/ksk3rnb0ljx8gngzk19jlmbjyvac4hw6-glibc-2.38-44/lib/libc.so.6
No symbol table info available.
#1  0x00007fb74c391e06 in raise () from /nix/store/ksk3rnb0ljx8gngzk19jlmbjyvac4hw6-glibc-2.38-44/lib/libc.so.6
No symbol table info available.
#2  0x00007fb74c37a8f5 in abort () from /nix/store/ksk3rnb0ljx8gngzk19jlmbjyvac4hw6-glibc-2.38-44/lib/libc.so.6
No symbol table info available.
#3  0x00007fb74c6f2c0b in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6
No symbol table info available.
#4  0x00007fb74c70221a in __cxxabiv1::__terminate(void (*)()) () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6
No symbol table info available.
#5  0x00007fb74c701299 in __cxa_call_terminate () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6
No symbol table info available.
#6  0x00007fb74c7019a6 in __gxx_personality_v0 () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6
No symbol table info available.
#7  0x00007fb74c5579f9 in _Unwind_RaiseException_Phase2 () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libgcc_s.so.1
No symbol table info available.
#8  0x00007fb74c5584ed in _Unwind_Resume () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libgcc_s.so.1
No symbol table info available.
#9  0x00007fb74cac3d66 in nix::curlFileTransfer::enqueueFileTransfer(nix::FileTransferRequest const&, nix::Callback) [clone .cold] ()
   from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#10 0x00007fb74cbae21d in nix::HttpBinaryCacheStore::getFile(std::__cxx11::basic_string, std::allocator > const&, nix::Callback, std::allocator > > >) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#11 0x00007fb74caf990c in nix::BinaryCacheStore::queryPathInfoUncached(nix::StorePath const&, nix::Callback >) ()
   from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#12 0x00007fb74cc13a5a in nix::Store::queryPathInfo(nix::StorePath const&, nix::Callback >) ()
   from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#13 0x00007fb74cc13eed in nix::Store::queryPathInfo(nix::StorePath const&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#14 0x00007fb74cc14580 in nix::Store::querySubstitutablePathInfos(std::map, std::less, std::allocator > > > const&, std::map, std::allocator > >&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#15 0x00007fb74cbcbca0 in nix::Store::queryMissing(std::vector > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned long&)::{lambda(nix::StorePath const&, nix::ref, nix::StorePath const&, nix::ref > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned long&)::DrvState, std::mutex> >)#1}::operator()(nix::StorePath const&, nix::ref, nix::StorePath const&, nix::ref > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned long&)::DrvState, std::mutex> >) const [clone .lto_priv.0] ()
   from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#16 0x00007fb74cbd4322 in std::_Function_handler > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned long&)::{lambda(nix::StorePath const&, nix::ref, nix::StorePath const&, nix::ref > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned --Type  for more, q to quit, c to continue without paging--
long&)::DrvState, std::mutex> >)#1} (nix::StorePath, nix::ref, nix::StorePath, nix::ref > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned long&)::DrvState, std::mutex> >)> >::_M_invoke(std::_Any_data const&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#17 0x00007fb74c9db1e1 in nix::ThreadPool::doWork(bool) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixutil.so
No symbol table info available.
#18 0x00007fb74c9db503 in nix::ThreadPool::process() () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixutil.so
No symbol table info available.
#19 0x00007fb74cbd2ed8 in nix::Store::queryMissing(std::vector > const&, std::set, std::allocator >&, std::set, std::allocator >&, std::set, std::allocator >&, unsigned long&, unsigned long&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so
No symbol table info available.
#20 0x000056028a40e9dd in main_nix_build(int, char**)::{lambda(std::vector > const&)#1}::operator()(std::vector > const&) const [clone .lto_priv.0] ()
No symbol table info available.
#21 0x000056028a4157c5 in main_nix_build(int, char**) [clone .lto_priv.0] ()
No symbol table info available.
#22 0x000056028a4a009e in nix::mainWrapped(int, char**) ()
No symbol table info available.
#23 0x00007fb74cdaa5f7 in nix::handleExceptions(std::__cxx11::basic_string, std::allocator > const&, std::function) ()
   from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixmain.so
No symbol table info available.
#24 0x000056028a3fe61c in main ()
No symbol table info available.

Steps To Reproduce

  1. Use the remote building functionality
  2. ???

Expected behavior

The assumption is that the SSH connection died. The error messsage could have been clearer.

The error was supposed to be transient. The connection could've been retried.

nix-env --version output

nix-env (Nix) 2.90.0-lix

Additional context

n/a

## Describe the bug A remote build suddenly errored out with ``` error: builder for '/nix/store/58jxr2x94kz4zq5x5gqffjsyz7kpxhgc-cargo-c-0.9.29.drv' failed with exit code 1 error: Nix daemon disconnected unexpectedly (maybe it crashed?) ``` On the remote builder (nix 2.18.2) I can find ``` Mar 25 18:20:04 build2 nix-daemon[1917]: accepted connection from pid 880823, user hexa (trusted) Mar 25 18:21:43 build2 nix-daemon[880825]: terminate called after throwing an instance of 'nix::Interrupted' Mar 25 18:21:43 build2 nix-daemon[880825]: what(): error: interrupted by the user ``` A coredump was created due to SIGABRT <details><summary>Call trace</summary> <pre> #0 0x00007fb74c3e107c in __pthread_kill_implementation () from /nix/store/ksk3rnb0ljx8gngzk19jlmbjyvac4hw6-glibc-2.38-44/lib/libc.so.6 No symbol table info available. #1 0x00007fb74c391e06 in raise () from /nix/store/ksk3rnb0ljx8gngzk19jlmbjyvac4hw6-glibc-2.38-44/lib/libc.so.6 No symbol table info available. #2 0x00007fb74c37a8f5 in abort () from /nix/store/ksk3rnb0ljx8gngzk19jlmbjyvac4hw6-glibc-2.38-44/lib/libc.so.6 No symbol table info available. #3 0x00007fb74c6f2c0b in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6 No symbol table info available. #4 0x00007fb74c70221a in __cxxabiv1::__terminate(void (*)()) () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6 No symbol table info available. #5 0x00007fb74c701299 in __cxa_call_terminate () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6 No symbol table info available. #6 0x00007fb74c7019a6 in __gxx_personality_v0 () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libstdc++.so.6 No symbol table info available. #7 0x00007fb74c5579f9 in _Unwind_RaiseException_Phase2 () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libgcc_s.so.1 No symbol table info available. #8 0x00007fb74c5584ed in _Unwind_Resume () from /nix/store/pp0jsd045xvfsz60kpbkfxbs9pbpk8z5-gcc-13.2.0-lib/lib/libgcc_s.so.1 No symbol table info available. #9 0x00007fb74cac3d66 in nix::curlFileTransfer::enqueueFileTransfer(nix::FileTransferRequest const&, nix::Callback<nix::FileTransferResult>) [clone .cold] () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #10 0x00007fb74cbae21d in nix::HttpBinaryCacheStore::getFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nix::Callback<std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #11 0x00007fb74caf990c in nix::BinaryCacheStore::queryPathInfoUncached(nix::StorePath const&, nix::Callback<std::shared_ptr<nix::ValidPathInfo const> >) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #12 0x00007fb74cc13a5a in nix::Store::queryPathInfo(nix::StorePath const&, nix::Callback<nix::ref<nix::ValidPathInfo const> >) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #13 0x00007fb74cc13eed in nix::Store::queryPathInfo(nix::StorePath const&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #14 0x00007fb74cc14580 in nix::Store::querySubstitutablePathInfos(std::map<nix::StorePath, std::optional<nix::ContentAddress>, std::less<nix::StorePath>, std::allocator<std::pair<nix::StorePath const, std::optional<nix::ContentAddress> > > > const&, std::map<nix::StorePath, nix::SubstitutablePathInfo, std::less<nix::StorePath>, std::allocator<std::pair<nix::StorePath const, nix::SubstitutablePathInfo> > >&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #15 0x00007fb74cbcbca0 in nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned long&)::{lambda(nix::StorePath const&, nix::ref<nix::Derivation>, nix::StorePath const&, nix::ref<nix::Sync<nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned long&)::DrvState, std::mutex> >)#1}::operator()(nix::StorePath const&, nix::ref<nix::Derivation>, nix::StorePath const&, nix::ref<nix::Sync<nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned long&)::DrvState, std::mutex> >) const [clone .lto_priv.0] () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #16 0x00007fb74cbd4322 in std::_Function_handler<void (), std::_Bind<nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned long&)::{lambda(nix::StorePath const&, nix::ref<nix::Derivation>, nix::StorePath const&, nix::ref<nix::Sync<nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned --Type <RET> for more, q to quit, c to continue without paging-- long&)::DrvState, std::mutex> >)#1} (nix::StorePath, nix::ref<nix::Derivation>, nix::StorePath, nix::ref<nix::Sync<nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned long&)::DrvState, std::mutex> >)> >::_M_invoke(std::_Any_data const&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #17 0x00007fb74c9db1e1 in nix::ThreadPool::doWork(bool) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixutil.so No symbol table info available. #18 0x00007fb74c9db503 in nix::ThreadPool::process() () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixutil.so No symbol table info available. #19 0x00007fb74cbd2ed8 in nix::Store::queryMissing(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, std::set<nix::StorePath, std::less<nix::StorePath>, std::allocator<nix::StorePath> >&, unsigned long&, unsigned long&) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixstore.so No symbol table info available. #20 0x000056028a40e9dd in main_nix_build(int, char**)::{lambda(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&)#1}::operator()(std::vector<nix::DerivedPath, std::allocator<nix::DerivedPath> > const&) const [clone .lto_priv.0] () No symbol table info available. #21 0x000056028a4157c5 in main_nix_build(int, char**) [clone .lto_priv.0] () No symbol table info available. #22 0x000056028a4a009e in nix::mainWrapped(int, char**) () No symbol table info available. #23 0x00007fb74cdaa5f7 in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) () from /nix/store/cs41wvvf98zsgw7vbpfabj5f8d2y0ihz-nix-2.18.2/lib/libnixmain.so No symbol table info available. #24 0x000056028a3fe61c in main () No symbol table info available. </pre> </details> ## Steps To Reproduce 1. Use the remote building functionality 2. ??? ## Expected behavior The assumption is that the SSH connection died. The error messsage could have been clearer. The error was supposed to be transient. The connection could've been retried. ## `nix-env --version` output nix-env (Nix) 2.90.0-lix ## Additional context n/a
mweinelt added the
bug
label 2024-03-25 18:37:10 +00:00
jade added the
Area/remote-builds
label 2024-05-23 00:51:02 +00:00
jade added the
crash 💥
label 2024-11-10 02:26:54 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#185
No description provided.