Random curl error: A value or data field grew larger than allowed on substitution can cause substitution failure #662

Closed
opened 2025-02-09 21:01:55 +00:00 by jade · 31 comments
Owner

I have seen these errors randomly and thought nothing of them since they always had resolved themselves for me, but it seems like in certain conditions they can consistently break and actually cause substitution errors.

The errors in question look like:

warning: error: unable to download 'https://cache.nixos.org/nar/0dzs2i76q485mylcdiv5q7vzvhima5sz6z9mn5v0y6xk5yil6gg5.nar.xz': HTTP error 206 () (curl error: A value or data field grew larger than allowed); retrying from offset 348241920 in 2554 ms

@lilyinstarlight managed to isolate it in a github actions run with daemon 2.92.0: https://github.com/lilyinstarlight/foosteros/actions/runs/13229359681/job/36924342389#step:6:11193

This is believed to not affect 2.91.x; only 2.92.x and newer, and was introduced in 4ae6fb5a8f.

I have seen these errors randomly and thought nothing of them since they always had resolved themselves for me, but it seems like in certain conditions they can consistently break and actually cause substitution errors. The errors in question look like: ``` warning: error: unable to download 'https://cache.nixos.org/nar/0dzs2i76q485mylcdiv5q7vzvhima5sz6z9mn5v0y6xk5yil6gg5.nar.xz': HTTP error 206 () (curl error: A value or data field grew larger than allowed); retrying from offset 348241920 in 2554 ms ``` @lilyinstarlight managed to isolate it in a github actions run with daemon 2.92.0: https://github.com/lilyinstarlight/foosteros/actions/runs/13229359681/job/36924342389#step:6:11193 This is believed to not affect 2.91.x; only 2.92.x and newer, and was introduced in 4ae6fb5a8f0d456b8d2ba2aaca3712b4e49057fc.
Owner

can we have that reproducing case run with CURL_DEBUG=all? i have the feeling that this is http2 related and caused by curl transfer buffers filling up due to our gratuitous use of multi handles for stuff that shouldn't live in multi handles

can we have that reproducing case run with `CURL_DEBUG=all`? i have the feeling that this is http2 related and caused by curl transfer buffers filling up due to our gratuitous use of multi handles for stuff that shouldn't live in multi handles
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/2528 ("filetransfer: report errbuf for failing http codes")
  • commit message in cl/2780 ("libstore: don't use curl decompression support")
  • commit message in cl/2781 ("libstore: don't use curl decompression support")
  • commit message in cl/2845 ("Revert "libstore: don't use curl decompression support"")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/2528", "number": 2528, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/2780", "number": 2780, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/2781", "number": 2781, "kind": "commit message"}, {"backlink": "https://gerrit.lix.systems/c/lix/+/2845", "number": 2845, "kind": "commit message"}], "cl_meta": {"2528": {"change_title": "filetransfer: report errbuf for failing http codes"}, "2780": {"change_title": "libstore: don't use curl decompression support"}, "2781": {"change_title": "libstore: don't use curl decompression support"}, "2845": {"change_title": "Revert \"libstore: don't use curl decompression support\""}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/2528](https://gerrit.lix.systems/c/lix/+/2528) ("filetransfer: report errbuf for failing http codes") * commit message in [cl/2780](https://gerrit.lix.systems/c/lix/+/2780) ("libstore: don't use curl decompression support") * commit message in [cl/2781](https://gerrit.lix.systems/c/lix/+/2781) ("libstore: don't use curl decompression support") * commit message in [cl/2845](https://gerrit.lix.systems/c/lix/+/2845) ("Revert "libstore: don't use curl decompression support"")
Author
Owner

i strongly suspect it to be this dynbuf code: 8289ac1be6/lib/cw-out.c (L332-L334)

https://github.com/curl/curl/pull/5966 suspicious

i strongly suspect it to be this dynbuf code: https://github.com/curl/curl/blob/8289ac1be645d3aa7d5be668528b049e14b73793/lib/cw-out.c#L332-L334 https://github.com/curl/curl/pull/5966 suspicious
Owner

yep. that should only trigger on http2 transfers since http1 doesn't have multiple substream in an ordered bytestream transport (which is a completely fucked up concept to begin with, but here we are)

yep. that should only trigger on http2 transfers since http1 doesn't have multiple substream in an ordered bytestream transport (which is a completely fucked up concept to begin with, but here we are)
Author
Owner

Reproducer:

    auto ft = makeFileTransfer();

    auto [res1, src1] = ft->download("https://jade.fyi/zeros.txt");
    auto [res2, src2] = ft->download("https://jade.fyi/zeros.txt");

    std::vector<char> buf(1024 * 1024 * 128);
    src1->read(buf.data(), 100);
    src2->read(buf.data(), 100);

    auto sink = NullSink{};

    src1->drainInto(sink);
    src2->drainInto(sink);

Thanks to horrors for telling me what would probably cause it. It happens when you have two transfers on the same FileTransfer to the same host (so the same connection) and one is paused. It's made worse by compressing 128MB of zeros.

Reproducer: ```c++ auto ft = makeFileTransfer(); auto [res1, src1] = ft->download("https://jade.fyi/zeros.txt"); auto [res2, src2] = ft->download("https://jade.fyi/zeros.txt"); std::vector<char> buf(1024 * 1024 * 128); src1->read(buf.data(), 100); src2->read(buf.data(), 100); auto sink = NullSink{}; src1->drainInto(sink); src2->drainInto(sink); ``` Thanks to horrors for telling me what would probably cause it. It happens when you have two transfers on the same FileTransfer to the same host (so the same connection) and one is paused. It's made worse by compressing 128MB of zeros.
Author
Owner

Submitted as a curl bug; I am not sure how we can fix this properly in the interim since we do need flow control: https://github.com/curl/curl/issues/16280

Submitted as a curl bug; I am not sure how we can fix this properly in the interim since we do need flow control: https://github.com/curl/curl/issues/16280
Author
Owner

One thing that will workaround this successfully if it is actively being a problem is http2 = false in the configuration, but this will almost certainly make querying paths slower.

One thing that *will* workaround this successfully if it is actively being a problem is `http2 = false` in the configuration, but this will almost certainly make querying paths slower.
Author
Owner

I just had the silliest patch idea for this while in the process of nearly falling asleep: why don't we just completely disable transfer compression for downloads? This isn't such a ridiculous idea because what does Lix actually download?

  • nars? compressed at rest unless you're foolish, no transfer compression necessary
  • narinfos? well, they can be compressed at rest, idk if they actually are in practice on c-n-o. this might suck a bit but idk how bad it is in reality. probably not that bad, they're full of high entropy hashes.
  • .ls files? barely used anyway

If we don't completely disable transfer compression for downloads we could at least reimplement lix-side download decompression, which would be a lot more thinking and effort for something that curl will plausibly fix within a year. But basically the fix here is that curl needs to not be doing compression for the time being since it has the broken interactions we see here.

I just had the silliest patch idea for this while in the process of nearly falling asleep: why don't we just completely disable transfer compression for downloads? This isn't such a ridiculous idea because what does Lix actually download? * nars? compressed at rest unless you're foolish, no transfer compression necessary * narinfos? well, they can be compressed at rest, idk if they actually are in practice on c-n-o. this might suck a bit but idk how bad it is in reality. probably not that bad, they're full of high entropy hashes. * .ls files? barely used anyway If we don't completely disable transfer compression for downloads we could at least reimplement lix-side download decompression, which would be a lot more thinking and effort for something that curl will plausibly fix within a year. But basically the fix here is that curl needs to not be doing compression for the time being since it has the broken interactions we see here.
Owner

we could indeed do that. there are still corner cases where it'll break, but on average it seems preferable to crashing completely :/

we could indeed do that. there are still corner cases where it'll break, but on average it seems preferable to crashing completely :/
Author
Owner

@lilyinstarlight try this overlay after your lix overlay which applies the fix from https://github.com/curl/curl/pull/16296:

final: prev: {
  curl-nix = prev.curl.overrideAttrs (old: {
    # https://github.com/curl/curl/pull/16296
    src = final.fetchFromGitHub {
      owner = "icing";
      repo = "curl";
      rev = "6cd8b69479c9d36597499a5b0e71126acf8f38af";
      sha256 = "sha256-JL3o6SlOzayZ60cG+Vj1GpPfx/3oT7pVY1urwZYou5c=";
    };
    patches = [ ];
    postPatch = null;
    nativeBuildInputs = old.nativeBuildInputs or [ ] ++ [
      final.buildPackages.autoreconfHook
      final.buildPackages.updateAutotoolsGnuConfigScriptsHook
    ];
    preConfigure = ''
      sed -e 's|/usr/bin|/no-such-path|g' -i.bak configure
      patchShebangs scripts/
    '';
  });

  lix = prev.lix.override {
    aws-sdk-cpp = prev.aws-sdk-cpp.override {
      curl = final.curl-nix;
    };
    curl = final.curl-nix;
  };
}

If this works then we can temporarily ship it in the lix overlay or in nixpkgs or . something. idk how we can ship this exactly because really we would like to wait for a merge in curl. But that's probably not going to take that long.

@lilyinstarlight try this overlay after your lix overlay which applies the fix from https://github.com/curl/curl/pull/16296: ```nix final: prev: { curl-nix = prev.curl.overrideAttrs (old: { # https://github.com/curl/curl/pull/16296 src = final.fetchFromGitHub { owner = "icing"; repo = "curl"; rev = "6cd8b69479c9d36597499a5b0e71126acf8f38af"; sha256 = "sha256-JL3o6SlOzayZ60cG+Vj1GpPfx/3oT7pVY1urwZYou5c="; }; patches = [ ]; postPatch = null; nativeBuildInputs = old.nativeBuildInputs or [ ] ++ [ final.buildPackages.autoreconfHook final.buildPackages.updateAutotoolsGnuConfigScriptsHook ]; preConfigure = '' sed -e 's|/usr/bin|/no-such-path|g' -i.bak configure patchShebangs scripts/ ''; }); lix = prev.lix.override { aws-sdk-cpp = prev.aws-sdk-cpp.override { curl = final.curl-nix; }; curl = final.curl-nix; }; } ``` If this works then we can temporarily ship it in the lix overlay or in nixpkgs or . something. idk how we can ship this exactly because really we would like to wait for a merge in curl. But that's probably not going to take *that* long.
Author
Owner

Okay so this has been "fixed" upstream but our testing shows the fix causes corrupt transfers which ummmmm. concerning. i think we want their compression code turned off for the time being.

Okay so this has been "fixed" upstream but our testing shows the fix causes corrupt transfers which ummmmm. concerning. i think we want their compression code turned off for the time being.
Owner

oh fire and wind we're not putting back the decompression sources, especially not the libarchive decompression source >,<

oh fire and wind we're not putting back the decompression sources, *especially* not the libarchive decompression source >,<
Author
Owner

yeah no, we can just have a very limited set of supported ones or just disable accepting transfer compression completely

yeah no, we can just have a very limited set of supported ones or just disable accepting transfer compression completely
Author
Owner

FYI @raito I looked at https://everything.curl.dev/internals/content-encoding.html#the-libcurl-interface

From some examination I found that we can have Content-Encoding: zstd narinfos, so we do actually have to replace the curl decompressor with something. horrors says there is something they have lying around that might be helpful.

I can try to look into this on Monday.

FYI @raito I looked at https://everything.curl.dev/internals/content-encoding.html#the-libcurl-interface From some examination I found that we can have `Content-Encoding: zstd` narinfos, so we do actually have to replace the curl decompressor with something. horrors says there is something they have lying around that might be helpful. I can try to look into this on Monday.
jade self-assigned this 2025-03-08 01:14:25 +00:00
Author
Owner
Fix: https://gerrit.lix.systems/c/lix/+/2780
jade removed their assignment 2025-03-09 08:40:00 +00:00
jade closed this issue 2025-03-09 21:14:54 +00:00
Owner

we still see this on 423a343937

we still see this on 423a34393759d75683ed84b470239a8d8f9a84ed
pennae reopened this issue 2025-03-10 13:31:05 +00:00
Author
Owner

ummmmmmm. how?! does it also overrun the buffer if it's not compressed, just with lower likelihood? misbehaving server?

ummmmmmm. how?! does it also overrun the buffer if it's not compressed, just with lower likelihood? misbehaving server?
Owner

no clue. curl isn't decompressing anything (otherwise the test cases we added would fail), but it's still failing :(

no clue. curl isn't decompressing anything (otherwise the test cases we added would fail), but it's still failing :(
Owner

we've confirmed that curl is indeed misbehaving by shoving a 1GiB fragment of /dev/urandom onto a webserver that delivers it as content-encoding: zstd. the file is larger than its contents and curl is explicitly instructed not to decode anything (by setting CURLOPT_ACCEPT_ENCODING = NULL as per the documentation), and the error still occurs. we haven't checked the patched curl yet

we've confirmed that curl is indeed misbehaving by shoving a 1GiB fragment of `/dev/urandom` onto a webserver that delivers it as `content-encoding: zstd`. the file is larger than its contents and curl is *explicitly* instructed not to decode anything (by setting `CURLOPT_ACCEPT_ENCODING = NULL` as per the documentation), and the error still occurs. we haven't checked the patched curl yet
Owner

we've tested curl HEAD, and it's even worse. the specific errors you saw originally are gone with lix HEAD, but curl can hang itself if http2 streams have wildly different receive capacities. for example this test will crash with an internal error when pointed at an http2 server that serves 1GiB of junk:

TEST(FileTransfer, what)
{
    auto ft = makeFileTransfer(0);
    auto a = ft->download("https://localhost:9999/foo").second;
    auto b = ft->download("https://localhost:9999/foo").second;

    auto af = std::async(std::launch::async, [&] {
        char c[1024];
        for (;;) {
            (*a)(c, sizeof(c));
        }
    });
    auto bf = std::async(std::launch::async, [&] {
        char c[1024];
        for (;;) {
            (*b)(c, sizeof(c));
            usleep(1000);
        }
    });
}
we've tested curl HEAD, and it's even worse. the specific errors you saw originally are gone with lix HEAD, but curl can hang itself if http2 streams have wildly different receive capacities. for example this test will crash with an internal error when pointed at an http2 server that serves 1GiB of junk: ``` TEST(FileTransfer, what) { auto ft = makeFileTransfer(0); auto a = ft->download("https://localhost:9999/foo").second; auto b = ft->download("https://localhost:9999/foo").second; auto af = std::async(std::launch::async, [&] { char c[1024]; for (;;) { (*a)(c, sizeof(c)); } }); auto bf = std::async(std::launch::async, [&] { char c[1024]; for (;;) { (*b)(c, sizeof(c)); usleep(1000); } }); } ```
Author
Owner

Tried the following setup:

Caddyfile:

{
    local_certs
    skip_install_trust
    auto_https disable_redirects
    debug
}
:9999 {
    bind 127.0.0.1
    tls internal {
        on_demand
    }

    root * ./trash
    file_server

    encode {
        zstd
        match {
            header Content-Type *
        }
    }
}

Web root:

mkdir trash
dd if=/dev/urandom of=trash/foo bs=1M count=1024

Build curl with this in a curl checkout, revision 8a45c2851aeb6f3ec18ad5c39c4042ab516891dd:

let
  pkgs = import <nixpkgs> { };
in
pkgs.curl.overrideAttrs (old: {
  src = builtins.fetchGit ./.;
  patches = [ ];
  postPatch = null;
  nativeBuildInputs = old.nativeBuildInputs or [ ] ++ [
    pkgs.buildPackages.autoreconfHook
    pkgs.buildPackages.updateAutotoolsGnuConfigScriptsHook
  ];
  configureFlags = old.configureFlags or [] ++ [
    "--enable-debug"
  ];
  env.CFLAGS = "-O2 -g";
  preConfigure = ''
    sed -e 's|/usr/bin|/no-such-path|g' -i.bak configure
    patchShebangs scripts/
  '';
})

Lix version: 5a7e9e1746

Lix setup:

diff --git a/lix/libstore/filetransfer.cc b/lix/libstore/filetransfer.cc
index cfc340874..23de64223 100644
--- a/lix/libstore/filetransfer.cc
+++ b/lix/libstore/filetransfer.cc
@@ -138,8 +138,8 @@ struct curlFileTransfer : public FileTransfer
                 // an accept-encoding header of our own and decompress manually :(
                 // we don't support deflate because libarchive also doesn't either
                 // cf https://git.lix.systems/lix-project/lix/issues/662 for infos
-                // curl_easy_setopt(req.get(), CURLOPT_ACCEPT_ENCODING, ""); // all of them!
-                appendCurlHeader("Accept-Encoding", "gzip, br, zstd");
+                curl_easy_setopt(req.get(), CURLOPT_ACCEPT_ENCODING, nullptr); // None of them!
+                appendCurlHeader("Accept-Encoding", "zstd");
             }
             curl_easy_setopt(req.get(), CURLOPT_MAXREDIRS, 10);
             curl_easy_setopt(req.get(), CURLOPT_NOSIGNAL, 1);
@@ -976,7 +976,7 @@ struct curlFileTransfer : public FileTransfer
         size_t read(char * data, size_t len) override
         {
             if (!decompressor) {
-                decompressor = makeDecompressionSource(wrapped->metadata.encoding, *wrapped);
+                decompressor = makeDecompressionSource("none", *wrapped);
             }
             return decompressor->read(data, len);
         }
diff --git a/lix/nix/nar.cc b/lix/nix/nar.cc
index 4730b10b1..c0b3c5020 100644
--- a/lix/nix/nar.cc
+++ b/lix/nix/nar.cc
@@ -1,8 +1,11 @@
 #include "lix/libcmd/command.hh"
+#include "lix/libstore/filetransfer.hh"
 #include "nar.hh"
 
 namespace nix {
 
+void go();
+
 struct CmdNar : MultiCommand
 {
     CmdNar() : MultiCommand(CommandRegistry::getCommandsFor({"nar"}))
@@ -24,9 +27,7 @@ struct CmdNar : MultiCommand
 
     void run() override
     {
-        if (!command)
-            throw UsageError("'nix nar' requires a sub-command.");
-        command->second->run();
+        go();
     }
 };
 
@@ -35,4 +36,25 @@ void registerNixNar()
     registerCommand<CmdNar>("nar");
 }
 
+void go()
+{
+    auto ft = makeFileTransfer(0);
+    auto a = ft->download("https://localhost:9999/foo").second;
+    auto b = ft->download("https://localhost:9999/foo").second;
+
+    auto af = std::async(std::launch::async, [&] {
+        char c[1024];
+        for (;;) {
+            (*a)(c, sizeof(c));
+        }
+    });
+    auto bf = std::async(std::launch::async, [&] {
+        char c[1024];
+        for (;;) {
+            (*b)(c, sizeof(c));
+            usleep(100);
+        }
+    });
+}
+
 }
diff --git a/package.nix b/package.nix
index 5a2f0d166..edf97bf87 100644
--- a/package.nix
+++ b/package.nix
@@ -519,6 +519,9 @@ stdenv.mkDerivation (finalAttrs: {
         rustfmt,
         rustPlatform,
 
+        # TODO
+        caddy,
+
         # debuggers
         gdb,
         rr,
@@ -591,6 +594,7 @@ stdenv.mkDerivation (finalAttrs: {
               doxygen
               # Load-bearing order. Must come before clang-unwrapped below, but after clang_tools above.
               stdenv.cc
+              caddy
             ]
             ++ [
               rust-analyzer

Run with:

$ ninja -C build install --quiet && LD_PRELOAD=$HOME/co/curl/result/lib/libcurl.so NIX_SSL_CERT_FILE=$HOME/.local/share/caddy/pki/authorities/local/root.crt nix nar --log-format raw --debug --verbose

downloading 'https://localhost:9999/foo'...
starting download of https://localhost:9999/foo
curl: !!! WARNING !!!
curl: This is a debug build of libcurl, do not use in production.
curl: Couldn't find host localhost in the /etc/nix/netrc file; using defaults
curl: Host localhost:9999 was resolved.
curl: IPv6: ::1
curl: IPv4: 127.0.0.1
curl:   Trying [::1]:9999...
curl: connect to ::1 port 9999 from ::1 port 34044 failed: Connection refused
curl:   Trying 127.0.0.1:9999...
curl: ALPN: curl offers h2,http/1.1
curl: TLSv1.3 (OUT), TLS handshake, Client hello (1):
curl:  CAfile: /home/jade/.local/share/caddy/pki/authorities/local/root.crt
curl:  CApath: none
curl: TLSv1.3 (IN), TLS handshake, Server hello (2):
curl: TLSv1.3 (IN), TLS change cipher, Change cipher spec (1):
curl: TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
curl: TLSv1.3 (IN), TLS handshake, Certificate (11):
curl: TLSv1.3 (IN), TLS handshake, CERT verify (15):
curl: TLSv1.3 (IN), TLS handshake, Finished (20):
curl: TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
curl: TLSv1.3 (OUT), TLS handshake, Finished (20):
curl: SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / id-ecPublicKey
curl: ALPN: server accepted h2
curl: Server certificate:
curl:  subject:
curl:  start date: Apr  2 17:55:45 2025 GMT
curl:  expire date: Apr  3 05:55:45 2025 GMT
curl:  subjectAltName: host "localhost" matched cert's "localhost"
curl:  issuer: CN=Caddy Local Authority - ECC Intermediate
curl:  SSL certificate verify ok.
curl:   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
curl:   Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
curl:   Certificate level 2: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
curl: Connected to localhost (127.0.0.1) port 9999
curl: using HTTP/2
curl: sending last upload chunk of 117 bytes
curl: [HTTP/2] [1] OPENED stream for https://localhost:9999/foo
curl: [HTTP/2] [1] [:method: GET]
curl: [HTTP/2] [1] [:scheme: https]
curl: [HTTP/2] [1] [:authority: localhost:9999]
curl: [HTTP/2] [1] [:path: /foo]
curl: [HTTP/2] [1] [user-agent: curl/8.11.0 Lix/2.93.0-dev]
curl: [HTTP/2] [1] [accept: */*]
curl: [HTTP/2] [1] [accept-encoding: zstd]
curl: Curl_xfer_send(len=117, eos=1) -> 0, 117
curl: TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
curl: Request completely sent off
curl: HTTP/2 found, allow multiplexing
got header for 'https://localhost:9999/foo': HTTP/2 200
got header for 'https://localhost:9999/foo': alt-svc: h3=":9999"; ma=2592000
got header for 'https://localhost:9999/foo': content-encoding: zstd
got header for 'https://localhost:9999/foo': content-type: application/octet-stream
got header for 'https://localhost:9999/foo': etag: "d8wc5qbdskh6hra0hs-zstd"
got header for 'https://localhost:9999/foo': last-modified: Wed, 02 Apr 2025 17:40:38 GMT
got header for 'https://localhost:9999/foo': server: Caddy
got header for 'https://localhost:9999/foo': vary: Accept-Encoding
got header for 'https://localhost:9999/foo': date: Wed, 02 Apr 2025 18:32:52 GMT
got header for 'https://localhost:9999/foo': 
downloading 'https://localhost:9999/foo'...
starting download of https://localhost:9999/foo
curl: Couldn't find host localhost in the /etc/nix/netrc file; using defaults
curl: Multiplexed connection found
curl: Re-using existing https: connection with host localhost
curl: sending last upload chunk of 117 bytes
curl: [HTTP/2] [3] OPENED stream for https://localhost:9999/foo
curl: [HTTP/2] [3] [:method: GET]
curl: [HTTP/2] [3] [:scheme: https]
curl: [HTTP/2] [3] [:authority: localhost:9999]
curl: [HTTP/2] [3] [:path: /foo]
curl: [HTTP/2] [3] [user-agent: curl/8.11.0 Lix/2.93.0-dev]
curl: [HTTP/2] [3] [accept: */*]
curl: [HTTP/2] [3] [accept-encoding: zstd]
curl: Curl_xfer_send(len=117, eos=1) -> 0, 117
curl: Request completely sent off
curl: HTTP/2 found, allow multiplexing
got header for 'https://localhost:9999/foo': HTTP/2 200
got header for 'https://localhost:9999/foo': alt-svc: h3=":9999"; ma=2592000
got header for 'https://localhost:9999/foo': content-encoding: zstd
got header for 'https://localhost:9999/foo': content-type: application/octet-stream
got header for 'https://localhost:9999/foo': etag: "d8wc5qbdskh6hra0hs-zstd"
got header for 'https://localhost:9999/foo': last-modified: Wed, 02 Apr 2025 17:40:38 GMT
got header for 'https://localhost:9999/foo': server: Caddy
got header for 'https://localhost:9999/foo': vary: Accept-Encoding
got header for 'https://localhost:9999/foo': date: Wed, 02 Apr 2025 18:32:52 GMT
got header for 'https://localhost:9999/foo': 
curl: sendrecv_dl: we are done
curl: nread == 0, stream closed, bailing
finished download of 'https://localhost:9999/foo'; curl status = 0, HTTP status = 200, body = 1073766413 bytes
curl: received GOAWAY, error=0, last_stream=3
^Ccurl: Callback aborted
curl: Curl_sendrecv() -> 42
curl: closing connection #0
finished download of 'https://localhost:9999/foo'; curl status = 42, HTTP status = 200, body = 10485760 bytes
download thread shutting down

Also tried with curl 438dd08b546a5690126c7355ae67a967a8475eae (from the same day you posted) and seemingly the same behaviour was observed. I am not sure what I am doing wrong here and why I cannot repro it.

I tried changing the usleep of the other thread to 100us so it is going at 10MB/s instead of 1MB/s. I may however have observed the hang. Let me see what I can get out of that line of questioning; maybe I can actually observe that?

Tried the following setup: Caddyfile: ``` { local_certs skip_install_trust auto_https disable_redirects debug } :9999 { bind 127.0.0.1 tls internal { on_demand } root * ./trash file_server encode { zstd match { header Content-Type * } } } ``` Web root: ``` mkdir trash dd if=/dev/urandom of=trash/foo bs=1M count=1024 ``` Build curl with this in a curl checkout, revision 8a45c2851aeb6f3ec18ad5c39c4042ab516891dd: ``` let pkgs = import <nixpkgs> { }; in pkgs.curl.overrideAttrs (old: { src = builtins.fetchGit ./.; patches = [ ]; postPatch = null; nativeBuildInputs = old.nativeBuildInputs or [ ] ++ [ pkgs.buildPackages.autoreconfHook pkgs.buildPackages.updateAutotoolsGnuConfigScriptsHook ]; configureFlags = old.configureFlags or [] ++ [ "--enable-debug" ]; env.CFLAGS = "-O2 -g"; preConfigure = '' sed -e 's|/usr/bin|/no-such-path|g' -i.bak configure patchShebangs scripts/ ''; }) ``` Lix version: 5a7e9e17464008904403af4297823443281bef12 Lix setup: ```diff diff --git a/lix/libstore/filetransfer.cc b/lix/libstore/filetransfer.cc index cfc340874..23de64223 100644 --- a/lix/libstore/filetransfer.cc +++ b/lix/libstore/filetransfer.cc @@ -138,8 +138,8 @@ struct curlFileTransfer : public FileTransfer // an accept-encoding header of our own and decompress manually :( // we don't support deflate because libarchive also doesn't either // cf https://git.lix.systems/lix-project/lix/issues/662 for infos - // curl_easy_setopt(req.get(), CURLOPT_ACCEPT_ENCODING, ""); // all of them! - appendCurlHeader("Accept-Encoding", "gzip, br, zstd"); + curl_easy_setopt(req.get(), CURLOPT_ACCEPT_ENCODING, nullptr); // None of them! + appendCurlHeader("Accept-Encoding", "zstd"); } curl_easy_setopt(req.get(), CURLOPT_MAXREDIRS, 10); curl_easy_setopt(req.get(), CURLOPT_NOSIGNAL, 1); @@ -976,7 +976,7 @@ struct curlFileTransfer : public FileTransfer size_t read(char * data, size_t len) override { if (!decompressor) { - decompressor = makeDecompressionSource(wrapped->metadata.encoding, *wrapped); + decompressor = makeDecompressionSource("none", *wrapped); } return decompressor->read(data, len); } diff --git a/lix/nix/nar.cc b/lix/nix/nar.cc index 4730b10b1..c0b3c5020 100644 --- a/lix/nix/nar.cc +++ b/lix/nix/nar.cc @@ -1,8 +1,11 @@ #include "lix/libcmd/command.hh" +#include "lix/libstore/filetransfer.hh" #include "nar.hh" namespace nix { +void go(); + struct CmdNar : MultiCommand { CmdNar() : MultiCommand(CommandRegistry::getCommandsFor({"nar"})) @@ -24,9 +27,7 @@ struct CmdNar : MultiCommand void run() override { - if (!command) - throw UsageError("'nix nar' requires a sub-command."); - command->second->run(); + go(); } }; @@ -35,4 +36,25 @@ void registerNixNar() registerCommand<CmdNar>("nar"); } +void go() +{ + auto ft = makeFileTransfer(0); + auto a = ft->download("https://localhost:9999/foo").second; + auto b = ft->download("https://localhost:9999/foo").second; + + auto af = std::async(std::launch::async, [&] { + char c[1024]; + for (;;) { + (*a)(c, sizeof(c)); + } + }); + auto bf = std::async(std::launch::async, [&] { + char c[1024]; + for (;;) { + (*b)(c, sizeof(c)); + usleep(100); + } + }); +} + } diff --git a/package.nix b/package.nix index 5a2f0d166..edf97bf87 100644 --- a/package.nix +++ b/package.nix @@ -519,6 +519,9 @@ stdenv.mkDerivation (finalAttrs: { rustfmt, rustPlatform, + # TODO + caddy, + # debuggers gdb, rr, @@ -591,6 +594,7 @@ stdenv.mkDerivation (finalAttrs: { doxygen # Load-bearing order. Must come before clang-unwrapped below, but after clang_tools above. stdenv.cc + caddy ] ++ [ rust-analyzer ``` Run with: ``` $ ninja -C build install --quiet && LD_PRELOAD=$HOME/co/curl/result/lib/libcurl.so NIX_SSL_CERT_FILE=$HOME/.local/share/caddy/pki/authorities/local/root.crt nix nar --log-format raw --debug --verbose downloading 'https://localhost:9999/foo'... starting download of https://localhost:9999/foo curl: !!! WARNING !!! curl: This is a debug build of libcurl, do not use in production. curl: Couldn't find host localhost in the /etc/nix/netrc file; using defaults curl: Host localhost:9999 was resolved. curl: IPv6: ::1 curl: IPv4: 127.0.0.1 curl: Trying [::1]:9999... curl: connect to ::1 port 9999 from ::1 port 34044 failed: Connection refused curl: Trying 127.0.0.1:9999... curl: ALPN: curl offers h2,http/1.1 curl: TLSv1.3 (OUT), TLS handshake, Client hello (1): curl: CAfile: /home/jade/.local/share/caddy/pki/authorities/local/root.crt curl: CApath: none curl: TLSv1.3 (IN), TLS handshake, Server hello (2): curl: TLSv1.3 (IN), TLS change cipher, Change cipher spec (1): curl: TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): curl: TLSv1.3 (IN), TLS handshake, Certificate (11): curl: TLSv1.3 (IN), TLS handshake, CERT verify (15): curl: TLSv1.3 (IN), TLS handshake, Finished (20): curl: TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): curl: TLSv1.3 (OUT), TLS handshake, Finished (20): curl: SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256 / x25519 / id-ecPublicKey curl: ALPN: server accepted h2 curl: Server certificate: curl: subject: curl: start date: Apr 2 17:55:45 2025 GMT curl: expire date: Apr 3 05:55:45 2025 GMT curl: subjectAltName: host "localhost" matched cert's "localhost" curl: issuer: CN=Caddy Local Authority - ECC Intermediate curl: SSL certificate verify ok. curl: Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256 curl: Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256 curl: Certificate level 2: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256 curl: Connected to localhost (127.0.0.1) port 9999 curl: using HTTP/2 curl: sending last upload chunk of 117 bytes curl: [HTTP/2] [1] OPENED stream for https://localhost:9999/foo curl: [HTTP/2] [1] [:method: GET] curl: [HTTP/2] [1] [:scheme: https] curl: [HTTP/2] [1] [:authority: localhost:9999] curl: [HTTP/2] [1] [:path: /foo] curl: [HTTP/2] [1] [user-agent: curl/8.11.0 Lix/2.93.0-dev] curl: [HTTP/2] [1] [accept: */*] curl: [HTTP/2] [1] [accept-encoding: zstd] curl: Curl_xfer_send(len=117, eos=1) -> 0, 117 curl: TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): curl: Request completely sent off curl: HTTP/2 found, allow multiplexing got header for 'https://localhost:9999/foo': HTTP/2 200 got header for 'https://localhost:9999/foo': alt-svc: h3=":9999"; ma=2592000 got header for 'https://localhost:9999/foo': content-encoding: zstd got header for 'https://localhost:9999/foo': content-type: application/octet-stream got header for 'https://localhost:9999/foo': etag: "d8wc5qbdskh6hra0hs-zstd" got header for 'https://localhost:9999/foo': last-modified: Wed, 02 Apr 2025 17:40:38 GMT got header for 'https://localhost:9999/foo': server: Caddy got header for 'https://localhost:9999/foo': vary: Accept-Encoding got header for 'https://localhost:9999/foo': date: Wed, 02 Apr 2025 18:32:52 GMT got header for 'https://localhost:9999/foo': downloading 'https://localhost:9999/foo'... starting download of https://localhost:9999/foo curl: Couldn't find host localhost in the /etc/nix/netrc file; using defaults curl: Multiplexed connection found curl: Re-using existing https: connection with host localhost curl: sending last upload chunk of 117 bytes curl: [HTTP/2] [3] OPENED stream for https://localhost:9999/foo curl: [HTTP/2] [3] [:method: GET] curl: [HTTP/2] [3] [:scheme: https] curl: [HTTP/2] [3] [:authority: localhost:9999] curl: [HTTP/2] [3] [:path: /foo] curl: [HTTP/2] [3] [user-agent: curl/8.11.0 Lix/2.93.0-dev] curl: [HTTP/2] [3] [accept: */*] curl: [HTTP/2] [3] [accept-encoding: zstd] curl: Curl_xfer_send(len=117, eos=1) -> 0, 117 curl: Request completely sent off curl: HTTP/2 found, allow multiplexing got header for 'https://localhost:9999/foo': HTTP/2 200 got header for 'https://localhost:9999/foo': alt-svc: h3=":9999"; ma=2592000 got header for 'https://localhost:9999/foo': content-encoding: zstd got header for 'https://localhost:9999/foo': content-type: application/octet-stream got header for 'https://localhost:9999/foo': etag: "d8wc5qbdskh6hra0hs-zstd" got header for 'https://localhost:9999/foo': last-modified: Wed, 02 Apr 2025 17:40:38 GMT got header for 'https://localhost:9999/foo': server: Caddy got header for 'https://localhost:9999/foo': vary: Accept-Encoding got header for 'https://localhost:9999/foo': date: Wed, 02 Apr 2025 18:32:52 GMT got header for 'https://localhost:9999/foo': curl: sendrecv_dl: we are done curl: nread == 0, stream closed, bailing finished download of 'https://localhost:9999/foo'; curl status = 0, HTTP status = 200, body = 1073766413 bytes curl: received GOAWAY, error=0, last_stream=3 ^Ccurl: Callback aborted curl: Curl_sendrecv() -> 42 curl: closing connection #0 finished download of 'https://localhost:9999/foo'; curl status = 42, HTTP status = 200, body = 10485760 bytes download thread shutting down ``` Also tried with curl 438dd08b546a5690126c7355ae67a967a8475eae (from the same day you posted) and seemingly the same behaviour was observed. I am not sure what I am doing wrong here and why I cannot repro it. I tried changing the usleep of the other thread to 100us so it is going at 10MB/s instead of 1MB/s. I may however have observed the hang. Let me see what I can get out of that line of questioning; maybe I can actually observe that?
Author
Owner

okay yes, I have this problem observable with CURL_DEBUG=all: the second connection seems to be stalled altogether:

curl: [1-0] [MULTI] [PERFORMING] multi_perform(running=1)
curl: [1-0] [TCP] recv(len=5) -> -1, err=81
curl: [1-0] [SSL] ossl_bio_cf_in_read(len=5) -> -1, err=81
curl: [1-0] [SSL] cf_recv(len=16384) -> -1, 81
curl: [1-0] [HTTP/2] [0] progress ingress: done
curl: [1-0] [HTTP/2] [3] cf_recv(len=16384) -> -1 81, window=10485760/10485760, connection 1012923353/1048576000
curl: [1-0] [MULTI] [PERFORMING] multi_wait pollset[fd=6 IN], timeouts=1
okay yes, I have this problem observable with `CURL_DEBUG=all`: the second connection seems to be stalled altogether: ``` curl: [1-0] [MULTI] [PERFORMING] multi_perform(running=1) curl: [1-0] [TCP] recv(len=5) -> -1, err=81 curl: [1-0] [SSL] ossl_bio_cf_in_read(len=5) -> -1, err=81 curl: [1-0] [SSL] cf_recv(len=16384) -> -1, 81 curl: [1-0] [HTTP/2] [0] progress ingress: done curl: [1-0] [HTTP/2] [3] cf_recv(len=16384) -> -1 81, window=10485760/10485760, connection 1012923353/1048576000 curl: [1-0] [MULTI] [PERFORMING] multi_wait pollset[fd=6 IN], timeouts=1 ```
Author
Owner

It looks like it could be either a deadlock in Lix itself, or curl eating unpauses.

It looks like it could be *either* a deadlock in Lix itself, *or* curl eating unpauses.
Author
Owner

I think it's curl eating unpauses.

I think it's curl eating unpauses.
Owner

afawr it's not so much that curl is eating unpauses but that it's internally pausing the wrong thing. when we looked it seemed like it was pausing the entire http2 connection once its buffers are full, or something like that?

afawr it's not so much that curl is eating unpauses but that it's internally pausing the wrong thing. when we looked it seemed like it was pausing the entire http2 connection once its buffers are full, or something like that?
Author
Owner

The observed symptom I have is that it's getting into a state where it's forgotten to send a window update for the second stream and then sits there like a catgirl waiting to be pet, expecting the server to send it more bytes, but will never receive any because it didn't request any. So it's a protocol level deadlock. The second stream is not paused at the api level but they didn't send any window updates when the internal pause buffer got emptied.

The observed symptom I have is that it's getting into a state where it's forgotten to send a window update for the second stream and then sits there like a catgirl waiting to be pet, expecting the server to send it more bytes, but will never receive any because it didn't request any. So it's a protocol level deadlock. The second stream is not paused at the api level but they didn't send any window updates when the internal pause buffer got emptied.
Owner

oh, excellent. why does lix expose so many curl bugs in a part of the library that should be stable 🫠

oh, excellent. why does lix expose so many curl bugs in a part of the library that should be stable 🫠
Author
Owner
FILED: https://github.com/curl/curl/issues/16955
Author
Owner

okay a fix is submitted for review. now i need to figure out who to poke to make sure it gets into nixpkgs and doesn't cause a widespread regression.

okay a fix is submitted for review. now i need to figure out who to poke to make sure it gets into nixpkgs and doesn't cause a widespread regression.
Author
Owner

@lilyinstarlight would you be able to test the fixed Curl on your setup and see if it still has the issue?

@lilyinstarlight would you be able to test the fixed Curl on your setup and see if it still has the issue?
Author
Owner

https://github.com/NixOS/nixpkgs/pull/396200#issuecomment-2795944006

Requested that nixpkgs backport the patch to this. I have been running it on my machines for the last week and not noticed issues.

I think this bug is gone.

https://github.com/NixOS/nixpkgs/pull/396200#issuecomment-2795944006 Requested that nixpkgs backport the patch to this. I have been running it on my machines for the last week and not noticed issues. I think this bug is gone.
jade closed this issue 2025-04-11 06:34:03 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#662
No description provided.