memcpy called with nullptr source in filetransfer.ff #492

Closed
opened 2024-08-29 22:25:13 +00:00 by pennae · 13 comments
Owner

seen in a test failure:

lix> error: NAR for '/build/nix-test/binary-cache/store/iw36yfdjsrcwq733slli8018x04mic93-dependencies-input-0' fetched from 'file:///build/nix-test/binary-cache/binary-cache' is incomplete
lix> copying path '/build/nix-test/binary-cache/store/iw36yfdjsrcwq733slli8018x04mic93-dependencies-input-0' from 'file:///build/nix-test/binary-cache/binary-cache'...
lix> ../src/libstore/filetransfer.cc:788:34: runtime error: null pointer passed as argument 2, which is declared to never be null

likely not problematic because this can only happen if n == 0 and the nullptr is thus not dereferenced, but somewho should fix that still. maybe calling awaitData earlier is enough? needs testing.

seen in a test failure: ``` lix> error: NAR for '/build/nix-test/binary-cache/store/iw36yfdjsrcwq733slli8018x04mic93-dependencies-input-0' fetched from 'file:///build/nix-test/binary-cache/binary-cache' is incomplete lix> copying path '/build/nix-test/binary-cache/store/iw36yfdjsrcwq733slli8018x04mic93-dependencies-input-0' from 'file:///build/nix-test/binary-cache/binary-cache'... lix> ../src/libstore/filetransfer.cc:788:34: runtime error: null pointer passed as argument 2, which is declared to never be null ``` likely not problematic because this can only happen if `n == 0` and the nullptr is thus not dereferenced, but somewho should fix that still. maybe calling `awaitData` earlier is enough? needs testing.
pennae added the
bug
E/easy
E/help wanted
labels 2024-08-29 22:25:13 +00:00

Hi @pennae! 😄

I would like to look into it. Could you please specify which test is causing this runtime error and how I can reproduce it locally?

Just from taking a cursory look in libstore/filetransfer.cc, I'm wondering whether it would make sense to initialize the field buffered as a string_view over an empty string.

Hi @pennae! 😄 I would like to look into it. Could you please specify which test is causing this runtime error and how I can reproduce it locally? Just from taking a cursory look in `libstore/filetransfer.cc`, I'm wondering whether it would make sense to initialize the field `buffered` as a `string_view` over an empty string.
Owner

the test that hit it is binary-cache, based on the logs above. but this isn't reproducible; if it were we would have hunted it down already as it would have blocked ci. my guess is that this was in the asan build. i really don't remember if we have instructions on building that.

it's -D b_sanitize=undefined,address added to just setup, iirc; the standard meson way to do it.

the test that hit it is binary-cache, based on the logs above. but this isn't reproducible; if it were we would have hunted it down already as it would have blocked ci. my guess is that this was in the asan build. i really don't remember if we have instructions on building that. it's `-D b_sanitize=undefined,address` added to `just setup`, iirc; the standard meson way to do it.

I had to disable the feature gc in order to add b_sanitize=address,undefined (otherwise I got an error saying "gc does far too many memory crimes for ASan"), like this:

just setup --reconfigure -D b_sanitize=address,undefined -D gc=disabled

I rebuilt and ran the tests again, but no dice. I can't reproduce the null pointer error.

Do you think this could be somehow reproduced in a test?

This was reported to happen in filetransfer.cc:788 (which is now filetransfer.cc:789, in InnerSource::read() and the only place where I can see InnerSource being used is in DownloadSource::inner, and DownloadSource is what gets returned from curlFileTransfer::download().

Would it make sense to try to reproduce this in a test for curlFileTransfer? This gets tested in tests/unit/libstore/filetransfer.cc in several test cases, so maybe I can try to add one there?

I had to disable the feature `gc` in order to add `b_sanitize=address,undefined` (otherwise I got an error saying "gc does far too many memory crimes for ASan"), like this: ```sh just setup --reconfigure -D b_sanitize=address,undefined -D gc=disabled ``` I rebuilt and ran the tests again, but no dice. I can't reproduce the `null pointer` error. Do you think this could be somehow reproduced in a test? This was reported to happen in filetransfer.cc:788 (which is now [filetransfer.cc:789](https://git.lix.systems/lix-project/lix/src/branch/main/src/libstore/filetransfer.cc#L789), in `InnerSource::read()` and the only place where I can see `InnerSource` being used is in [`DownloadSource::inner`](https://git.lix.systems/lix-project/lix/src/branch/main/src/libstore/filetransfer.cc#L817), and `DownloadSource` is what gets returned from [`curlFileTransfer::download()`](https://git.lix.systems/lix-project/lix/src/branch/main/src/libstore/filetransfer.cc#L836-L839). Would it make sense to try to reproduce this in a test for `curlFileTransfer`? This gets tested in `tests/unit/libstore/filetransfer.cc` in several test cases, so maybe I can try to add one there?
Owner

I don't know if it is reproducible easily. I would try to figure out how this happened by code auditing instead, personally.

I don't know if it is reproducible easily. I would try to figure out how this happened by code auditing instead, personally.
Owner

It looks like the sequence of events that would trigger this is a 0 byte read() immediately after creation of the InnerSource, before any other read calls. I do not know which conditions trigger that. But I do not expect a test to trigger it given that we have just seen it once.

It looks like the sequence of events that would trigger this is a 0 byte read() immediately after creation of the `InnerSource`, before any other read calls. I do not know which conditions trigger that. But I do not expect a test to trigger it given that we have just seen it once.
Owner

Try fetching an empty file with no compression and immediately performing a read with a length of zero, perhaps? There are vanishingly few sequences of operations besides that or perhaps a compression algorithm returning no data that could plausibly leave the string_view on what i assume was default-initialized to a null buffer.

I also don't know what triggered this further up the stack. Many mysteries.

Try fetching an empty file with no compression and immediately performing a read with a length of zero, perhaps? There are vanishingly few sequences of operations besides that or perhaps a compression algorithm returning no data that could plausibly leave the string_view on what i assume was default-initialized to a null buffer. I also don't know what triggered this further up the stack. Many mysteries.

Yes, this is what I had in mind. I'll try it and let's see what happens.

Yes, this is what I had in mind. I'll try it and let's see what happens.
Owner

... brief inspection further up the stack of binary cache store code reveals that all callers (narFromPath, getFileContents) use fixed size buffers that would never permit this condition. Looking at the error message above it, yeah uhhhh sorry, this definitely was not "easy" and we misfiled it. That error message, I think, can only happen if something in the NAR parsing machinery gets an EOF (though it could be literally anything else in substitutePath). But then I don't know why memcpy got a null buffer with 0 length afterwards.

I have tried a 0 length file substitution with nix-store -r and it did not crash like this.

The easy part is just returning early before calling memcpy if len == 0, that would prevent this happening again. As for why it happened, I still have no idea. Without a reproducer I don't think this bug is worth anyone's time to actually find.

... brief inspection further up the stack of binary cache store code reveals that all callers (narFromPath, getFileContents) use fixed size buffers that would never permit this condition. Looking at the error message above it, yeah uhhhh sorry, this definitely was not "easy" and we misfiled it. That error message, I think, can only happen if something in the NAR parsing machinery [gets an EOF](https://git.lix.systems/lix-project/lix/src/9b05636937fe108efbf39a005807e11d36c7b057/src/libstore/build/substitution-goal.cc#L222-L228) (though it could be literally anything else in substitutePath). But then I don't know why memcpy got a null buffer with 0 length afterwards. I have tried a 0 length file substitution with `nix-store -r` and it did not crash like this. The *easy* part is just returning early before calling memcpy if `len == 0`, that would prevent this happening again. As for why it happened, I still have no idea. Without a reproducer I don't think this bug is worth anyone's time to actually find.

I made the following test to serve a 0 byte file and try to read it:

TEST(FileTransfer, nullPointer)
{
    auto [port, srv] = serveHTTP("200 ok", "content-length: 0\r\n", [&] { return ""; });
    auto ft = makeFileTransfer();
    auto source = ft->download(FileTransferRequest(fmt("http://[::1]:%d/index", port)));
    size_t len = 10;
    char buf[] = "1234567890";
    source->read(buf, len);
    ASSERT_STREQ(buf, "1234567890");
}

... and read() throws EndOfFile("download finished").

I checked that buffered.data() is not a null pointer (I cast it into a std::uintptr_t and threw it as an exception and got "88510686156624") and buffered.size() is 0.

So, I can't reproduce it. One thing I can still suggest is to skip memcpy and buffered.remove_prefix if available is 0. It's a bit defensive, but maybe memcpy shouldn't be messed with when there's no data.

What do you think?

I made the following test to serve a 0 byte file and try to read it: ```cpp TEST(FileTransfer, nullPointer) { auto [port, srv] = serveHTTP("200 ok", "content-length: 0\r\n", [&] { return ""; }); auto ft = makeFileTransfer(); auto source = ft->download(FileTransferRequest(fmt("http://[::1]:%d/index", port))); size_t len = 10; char buf[] = "1234567890"; source->read(buf, len); ASSERT_STREQ(buf, "1234567890"); } ``` ... and `read()` throws `EndOfFile("download finished")`. I checked that `buffered.data()` is not a null pointer (I cast it into a std::uintptr_t and threw it as an exception and got "88510686156624") and `buffered.size()` is 0. So, I can't reproduce it. One thing I can still suggest is to skip [`memcpy` and `buffered.remove_prefix`](https://git.lix.systems/lix-project/lix/src/branch/main/src/libstore/filetransfer.cc#L789-L790) if `available` is 0. It's a bit defensive, but maybe `memcpy` shouldn't be messed with when there's no data. What do you think?

Oh, this is what you meant by early return...

There's already a test called exceptionAbortsDownload. I can add one for exceptionAbortsRead to show that an EOF exception is also thrown with ft->download(...)->read(...).

I'll make a CL.

Oh, this is what you meant by early return... There's already a test called `exceptionAbortsDownload`. I can add one for `exceptionAbortsRead` to show that an EOF exception is also thrown with `ft->download(...)->read(...)`. I'll make a CL.
Owner

Yeah, we could do that or we could just close the bug without doing anything (since any reasonable memcpy should just immediately exit if there is zero length), tbh. Up to you. I don't understand at all the conditions that could have caused it and running your test with a length of zero does nothing interesting either, and gdb confirms there is no null pointer. There may be some breakage in an error path somehow but I really just don't have any idea where that could be or how that could occur, and inspecting the code seems to reveal nothing either.

Yeah, we could do that or we could just close the bug without doing anything (since any reasonable memcpy should just immediately exit if there is zero length), tbh. Up to you. I don't understand at all the conditions that could have caused it and running your test with a length of zero does nothing interesting either, and gdb confirms there is no null pointer. There may be some breakage in an error path somehow but I really just don't have any idea where that could be or how that could occur, and inspecting the code seems to reveal nothing either.
jade closed this issue 2024-10-07 07:38:53 +00:00
jade reopened this issue 2024-10-07 07:39:20 +00:00
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/2045 ("Avoid calling memcpy when len == 0 in filetransfer.cc")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/2045", "number": 2045, "kind": "commit message"}], "cl_meta": {"2045": {"change_title": "Avoid calling memcpy when len == 0 in filetransfer.cc"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/2045](https://gerrit.lix.systems/c/lix/+/2045) ("Avoid calling memcpy when len == 0 in filetransfer.cc")

https://gerrit.lix.systems/c/lix/+/2045

Yes, I acknowledge that this is defensive, but it doesn't do any harm, so why not.
I think it's good to have a test for the EndOfFile exception anyway.

https://gerrit.lix.systems/c/lix/+/2045 Yes, I acknowledge that this is defensive, but it doesn't do any harm, so why not. I think it's good to have a test for the `EndOfFile` exception anyway.
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#492
No description provided.