From a0fb52c0af7b237e455c08689495e7c893a65ce8 Mon Sep 17 00:00:00 2001 From: Jade Lovelace Date: Wed, 9 Oct 2024 15:38:40 -0700 Subject: [PATCH] Fix std::terminate call in thread pool So we received a report that the thread pool crashed due to an Interrupted exception. Relevant log tail: copying path '/nix/store/0kal2k73inviikxv9f1ciaj39lkl9a87-etc-os-release' to 'ssh://192.168.0.27'... Lix crashed. This is a bug. We would appreciate if you report it along with what caused it at https://git.lix.systems/lix-project/lix/issues with the following information included: error (ignored): error: interrupted by the user Exception: nix::Interrupted: error: interrupted by the user Relevant stack trace: 4# __cxa_rethrow in /nix/store/22nxhmsfcv2q2rpkmfvzwg2w5z1l231z-gcc-13.3.0-lib/lib/libstdc++.so.6 5# nix::ignoreExceptionExceptInterrupt(nix::Verbosity) in /nix/store/ghxr2ykqc3rrfcy8rzdys0rzx9ah5fqj-lix-2.92.0-dev-pre20241005-ed9b7f4/lib/liblixutil.so 6# nix::ThreadPool::doWork(bool) in /nix/store/ghxr2ykqc3rrfcy8rzdys0rzx9ah5fqj-lix-2.92.0-dev-pre20241005-ed9b7f4/lib/liblixutil.so 7# 0x00007FA7A00E86D3 in /nix/store/22nxhmsfcv2q2rpkmfvzwg2w5z1l231z-gcc-13.3.0-lib/lib/libstdc++.so.6 8# 0x00007FA79FE99A42 in /nix/store/3dyw8dzj9ab4m8hv5dpyx7zii8d0w6fi-glibc-2.39-52/lib/libc.so.6 9# 0x00007FA79FF1905C in /nix/store/3dyw8dzj9ab4m8hv5dpyx7zii8d0w6fi-glibc-2.39-52/lib/libc.so.6 Notably, this is *not* in the main thread, so this implies that the thread didn't get joined properly before their destructors got called. That, in turn, should have only possibly happened because join() threw on a previous iteration of the loop joining threads, I think. Or if it threw while in the ThreadPool destructor. Either way we had better stop letting Interrupted fall out of our child threads! If: - Interrupted was thrown inside the action in the main thread: it would have fallen out of doWork if state->exception was already set and got caught by ThreadPool::process, calling shutdown() and the join loop which would crash the process entirely. - Interrupted was thrown inside the action on a secondary thread: it would have been caught and put into the exception field and then possibly rethrown to fall out of the thread (since it was previously ignoreExceptionExceptInterrupt). The one possible hole in this hypothesis is that there is an "error (ignored)" line in there implying that at least one Interrupted got eaten by an ignoreExceptionInDestructor. It's also unclear whether this got reordered because of stderr buffering. Fixes: https://git.lix.systems/lix-project/lix/issues/542 Change-Id: I322cf050da660af78f5cb0e08ec6e6d27d09ac76 --- src/libutil/thread-pool.cc | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/src/libutil/thread-pool.cc b/src/libutil/thread-pool.cc index cd380b608..1c4488373 100644 --- a/src/libutil/thread-pool.cc +++ b/src/libutil/thread-pool.cc @@ -109,8 +109,21 @@ void ThreadPool::doWork(bool mainThread) try { std::rethrow_exception(exc); } catch (std::exception & e) { - if (!dynamic_cast(&e)) - ignoreExceptionExceptInterrupt(); + if (!dynamic_cast(&e)) { + // Yes, this is not a destructor, but we cannot + // safely propagate an exception out of here. + // + // What happens is that if we do, shutdown() + // will have join() throw an exception if we + // are on a worker thread, preventing us from + // joining the rest of the threads. Although we + // could make the joining eat exceptions too, + // we could just as well not let Interrupted + // fall out to begin with, since the thread + // will immediately cleanly quit because of + // quit == true anyway. + ignoreExceptionInDestructor(); + } } catch (...) { } }