[Nix#8946] Crash in Nix 2.16.1 #117

Open
opened 2024-03-16 06:44:59 +00:00 by lix-bot · 0 comments
Member

Upstream-Issue: NixOS/nix#8946

Describe the bug

This is on a relatively simple server running Hydra + Harmonia + Nginx, deployed via colmena. It was recently updated from Nix 2.13 to 2.16.1, and we started getting http timeouts to the Hydra API (which we remote control from some Jenkins jobs). I haven't yet been able to get symbols loaded for the binaries, but the unsymbolized backtrace looks like this:

                #0  0x00007f103ffa4adc __pthread_kill_implementation (libc.so.6 + 0x87adc)
                #1  0x00007f103ff55cb6 raise (libc.so.6 + 0x38cb6)
                #2  0x00007f103ff3f8ba abort (libc.so.6 + 0x228ba)
                #3  0x00007f103ff405f5 __libc_message.cold (libc.so.6 + 0x235f5)
                #4  0x00007f103ff98709 __libc_fatal (libc.so.6 + 0x7b709)
                #5  0x00007f103ffab214 unwind_cleanup (libc.so.6 + 0x8e214)
                #6  0x00007f1040567e38 _ZN3nix16triggerInterruptEv.cold (libnixutil.so + 0x67e38)
                #7  0x00000000004e5805 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN3nix12MonitorFdHupC4EiEUlvE_EEEEE6_M_runEv (nix + 0xe5805)
                #8  0x00007f10402e6613 execute_native_thread_routine (libstdc++.so.6 + 0xe0613)
                #9  0x00007f103ffa2e24 start_thread (libc.so.6 + 0x85e24)
                #10 0x00007f10400249b0 __clone3 (libc.so.6 + 0x1079b0)

                Stack trace of thread 3797015:
                #0  0x00007f103ff9fa36 __futex_abstimed_wait_common (libc.so.6 + 0x82a36)
                #1  0x00007f103ffa4883 __pthread_clockjoin_ex (libc.so.6 + 0x87883)
                #2  0x00007f10402e6687 _ZNSt6thread4joinEv (libstdc++.so.6 + 0xe0687)
                #3  0x00007f1040750cb8 _ZNKSt14default_deleteIN3nix12MonitorFdHupEEclEPS1_.part.0 (libnixstore.so + 0xf7cb8)
                #4  0x00007f1040752292 _ZN3nix6daemon17processConnectionENS_3refINS_5StoreEEERNS_8FdSourceERNS_6FdSinkENS_11TrustedFlagENS0_13RecursiveFlagE.cold (libnixstore.so + 0xf9292)
                #5  0x000000000050d68d _ZNSt17_Function_handlerIFvvEZL10daemonLoopSt8optionalIN3nix11TrustedFlagEEEUlvE_E9_M_invokeERKSt9_Any_data (nix + 0x10d68d)
                #6  0x00007f1040602cdf _ZNSt17_Function_handlerIFvvEZN3nix12startProcessESt8functionIS0_ERKNS1_14ProcessOptionsEEUlvE_E9_M_invokeERKSt9_Any_data (libnixutil.so + 0x102cdf)
                #7  0x00007f10405fecc1 _ZN3nixL6doForkEbSt8functionIFvvEE (libnixutil.so + 0xfecc1)
                #8  0x00007f1040603bd0 _ZN3nix12startProcessESt8functionIFvvEERKNS_14ProcessOptionsE (libnixutil.so + 0x103bd0)
                #9  0x000000000050df50 _ZL10daemonLoopSt8optionalIN3nix11TrustedFlagEE (nix + 0x10df50)
                #10 0x000000000050f0a3 _ZL15main_nix_daemoniPPc (nix + 0x10f0a3)
                #11 0x000000000058cea4 _ZN3nix11mainWrappedEiPPc (nix + 0x18cea4)
                #12 0x00007f1040b6c46a _ZN3nix16handleExceptionsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvvEE (libnixmain.so + 0x3346a)
                #13 0x00000000004632e7 main (nix + 0x632e7)
                #14 0x00007f103ff40ace __libc_start_call_main (libc.so.6 + 0x23ace)
                #15 0x00007f103ff40b89 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
                #16 0x0000000000467d35 _start (nix + 0x67d35)

                Stack trace of thread 3924129:
                #0  0x00007f104001777f __poll (libc.so.6 + 0xfa77f)
                #1  0x00007f103f8f3b00 Curl_poll (libcurl.so.4 + 0x5ab00)
                #2  0x00007f103f8ea426 multi_wait.part.0 (libcurl.so.4 + 0x51426)
                #3  0x00007f103f8ea642 curl_multi_wait (libcurl.so.4 + 0x51642)
                #4  0x00007f104086abeb _ZN3nix16curlFileTransfer16workerThreadMainEv (libnixstore.so + 0x211beb)
                #5  0x00007f104086bd5c _ZN3nix16curlFileTransfer17workerThreadEntryEv (libnixstore.so + 0x212d5c)
                #6  0x00007f10402e6613 execute_native_thread_routine (libstdc++.so.6 + 0xe0613)
                #7  0x00007f103ffa2e24 start_thread (libc.so.6 + 0x85e24)
                #8  0x00007f10400249b0 __clone3 (libc.so.6 + 0x1079b0)
                ELF object binary architecture: AMD x86-64

Steps To Reproduce

No clear repro, but it only happens when the Hydra instance is "doing work"— when it's idle, it remains up.

Expected behavior

Nix should not crash, ever.

nix-env --version output

# nix --version
nix (Nix) 2.16.1

# ls -la /run/current-system/sw/bin/nix
lrwxrwxrwx 1 root root 62 Jan  1  1970 /run/current-system/sw/bin/nix -> /nix/store/lihqijbf96az03rchl9fp7c6ym7cmmyp-nix-2.16.1/bin/nix```

Additional context

Next steps on our side will be downgrading back to Nix 2.13, though we may need to do some backporting in Harmonia, as we require fixes there that are currently Nix 2.16+ only. FYI @Mic92, @zimbatm

Upstream-Issue: https://git.lix.systems/NixOS/nix/issues/8946 **Describe the bug** This is on a relatively simple server running Hydra + Harmonia + Nginx, deployed via colmena. It was recently updated from Nix 2.13 to 2.16.1, and we started getting http timeouts to the Hydra API (which we remote control from some Jenkins jobs). I haven't yet been able to get symbols loaded for the binaries, but the unsymbolized backtrace looks like this: ``` #0 0x00007f103ffa4adc __pthread_kill_implementation (libc.so.6 + 0x87adc) #1 0x00007f103ff55cb6 raise (libc.so.6 + 0x38cb6) #2 0x00007f103ff3f8ba abort (libc.so.6 + 0x228ba) #3 0x00007f103ff405f5 __libc_message.cold (libc.so.6 + 0x235f5) #4 0x00007f103ff98709 __libc_fatal (libc.so.6 + 0x7b709) #5 0x00007f103ffab214 unwind_cleanup (libc.so.6 + 0x8e214) #6 0x00007f1040567e38 _ZN3nix16triggerInterruptEv.cold (libnixutil.so + 0x67e38) #7 0x00000000004e5805 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN3nix12MonitorFdHupC4EiEUlvE_EEEEE6_M_runEv (nix + 0xe5805) #8 0x00007f10402e6613 execute_native_thread_routine (libstdc++.so.6 + 0xe0613) #9 0x00007f103ffa2e24 start_thread (libc.so.6 + 0x85e24) #10 0x00007f10400249b0 __clone3 (libc.so.6 + 0x1079b0) Stack trace of thread 3797015: #0 0x00007f103ff9fa36 __futex_abstimed_wait_common (libc.so.6 + 0x82a36) #1 0x00007f103ffa4883 __pthread_clockjoin_ex (libc.so.6 + 0x87883) #2 0x00007f10402e6687 _ZNSt6thread4joinEv (libstdc++.so.6 + 0xe0687) #3 0x00007f1040750cb8 _ZNKSt14default_deleteIN3nix12MonitorFdHupEEclEPS1_.part.0 (libnixstore.so + 0xf7cb8) #4 0x00007f1040752292 _ZN3nix6daemon17processConnectionENS_3refINS_5StoreEEERNS_8FdSourceERNS_6FdSinkENS_11TrustedFlagENS0_13RecursiveFlagE.cold (libnixstore.so + 0xf9292) #5 0x000000000050d68d _ZNSt17_Function_handlerIFvvEZL10daemonLoopSt8optionalIN3nix11TrustedFlagEEEUlvE_E9_M_invokeERKSt9_Any_data (nix + 0x10d68d) #6 0x00007f1040602cdf _ZNSt17_Function_handlerIFvvEZN3nix12startProcessESt8functionIS0_ERKNS1_14ProcessOptionsEEUlvE_E9_M_invokeERKSt9_Any_data (libnixutil.so + 0x102cdf) #7 0x00007f10405fecc1 _ZN3nixL6doForkEbSt8functionIFvvEE (libnixutil.so + 0xfecc1) #8 0x00007f1040603bd0 _ZN3nix12startProcessESt8functionIFvvEERKNS_14ProcessOptionsE (libnixutil.so + 0x103bd0) #9 0x000000000050df50 _ZL10daemonLoopSt8optionalIN3nix11TrustedFlagEE (nix + 0x10df50) #10 0x000000000050f0a3 _ZL15main_nix_daemoniPPc (nix + 0x10f0a3) #11 0x000000000058cea4 _ZN3nix11mainWrappedEiPPc (nix + 0x18cea4) #12 0x00007f1040b6c46a _ZN3nix16handleExceptionsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvvEE (libnixmain.so + 0x3346a) #13 0x00000000004632e7 main (nix + 0x632e7) #14 0x00007f103ff40ace __libc_start_call_main (libc.so.6 + 0x23ace) #15 0x00007f103ff40b89 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89) #16 0x0000000000467d35 _start (nix + 0x67d35) Stack trace of thread 3924129: #0 0x00007f104001777f __poll (libc.so.6 + 0xfa77f) #1 0x00007f103f8f3b00 Curl_poll (libcurl.so.4 + 0x5ab00) #2 0x00007f103f8ea426 multi_wait.part.0 (libcurl.so.4 + 0x51426) #3 0x00007f103f8ea642 curl_multi_wait (libcurl.so.4 + 0x51642) #4 0x00007f104086abeb _ZN3nix16curlFileTransfer16workerThreadMainEv (libnixstore.so + 0x211beb) #5 0x00007f104086bd5c _ZN3nix16curlFileTransfer17workerThreadEntryEv (libnixstore.so + 0x212d5c) #6 0x00007f10402e6613 execute_native_thread_routine (libstdc++.so.6 + 0xe0613) #7 0x00007f103ffa2e24 start_thread (libc.so.6 + 0x85e24) #8 0x00007f10400249b0 __clone3 (libc.so.6 + 0x1079b0) ELF object binary architecture: AMD x86-64 ``` **Steps To Reproduce** No clear repro, but it only happens when the Hydra instance is "doing work"— when it's idle, it remains up. **Expected behavior** Nix should not crash, ever. **`nix-env --version` output** ``` # nix --version nix (Nix) 2.16.1 # ls -la /run/current-system/sw/bin/nix lrwxrwxrwx 1 root root 62 Jan 1 1970 /run/current-system/sw/bin/nix -> /nix/store/lihqijbf96az03rchl9fp7c6ym7cmmyp-nix-2.16.1/bin/nix``` ``` **Additional context** Next steps on our side will be downgrading back to Nix 2.13, though we may need to do some backporting in Harmonia, as we require fixes there that are currently Nix 2.16+ only. FYI @Mic92, @zimbatm
lix-bot added the
bug
imported
labels 2024-03-16 06:44:59 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#117
No description provided.