memcpy called with nullptr source in filetransfer.ff #492
Labels
No labels
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/store
bug
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
RFD
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#492
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
seen in a test failure:
likely not problematic because this can only happen if
n == 0
and the nullptr is thus not dereferenced, but somewho should fix that still. maybe callingawaitData
earlier is enough? needs testing.Hi @pennae! 😄
I would like to look into it. Could you please specify which test is causing this runtime error and how I can reproduce it locally?
Just from taking a cursory look in
libstore/filetransfer.cc
, I'm wondering whether it would make sense to initialize the fieldbuffered
as astring_view
over an empty string.the test that hit it is binary-cache, based on the logs above. but this isn't reproducible; if it were we would have hunted it down already as it would have blocked ci. my guess is that this was in the asan build. i really don't remember if we have instructions on building that.
it's
-D b_sanitize=undefined,address
added tojust setup
, iirc; the standard meson way to do it.I had to disable the feature
gc
in order to addb_sanitize=address,undefined
(otherwise I got an error saying "gc does far too many memory crimes for ASan"), like this:I rebuilt and ran the tests again, but no dice. I can't reproduce the
null pointer
error.Do you think this could be somehow reproduced in a test?
This was reported to happen in filetransfer.cc:788 (which is now filetransfer.cc:789, in
InnerSource::read()
and the only place where I can seeInnerSource
being used is inDownloadSource::inner
, andDownloadSource
is what gets returned fromcurlFileTransfer::download()
.Would it make sense to try to reproduce this in a test for
curlFileTransfer
? This gets tested intests/unit/libstore/filetransfer.cc
in several test cases, so maybe I can try to add one there?I don't know if it is reproducible easily. I would try to figure out how this happened by code auditing instead, personally.
It looks like the sequence of events that would trigger this is a 0 byte read() immediately after creation of the
InnerSource
, before any other read calls. I do not know which conditions trigger that. But I do not expect a test to trigger it given that we have just seen it once.Try fetching an empty file with no compression and immediately performing a read with a length of zero, perhaps? There are vanishingly few sequences of operations besides that or perhaps a compression algorithm returning no data that could plausibly leave the string_view on what i assume was default-initialized to a null buffer.
I also don't know what triggered this further up the stack. Many mysteries.
Yes, this is what I had in mind. I'll try it and let's see what happens.
... brief inspection further up the stack of binary cache store code reveals that all callers (narFromPath, getFileContents) use fixed size buffers that would never permit this condition. Looking at the error message above it, yeah uhhhh sorry, this definitely was not "easy" and we misfiled it. That error message, I think, can only happen if something in the NAR parsing machinery gets an EOF (though it could be literally anything else in substitutePath). But then I don't know why memcpy got a null buffer with 0 length afterwards.
I have tried a 0 length file substitution with
nix-store -r
and it did not crash like this.The easy part is just returning early before calling memcpy if
len == 0
, that would prevent this happening again. As for why it happened, I still have no idea. Without a reproducer I don't think this bug is worth anyone's time to actually find.I made the following test to serve a 0 byte file and try to read it:
... and
read()
throwsEndOfFile("download finished")
.I checked that
buffered.data()
is not a null pointer (I cast it into a std::uintptr_t and threw it as an exception and got "88510686156624") andbuffered.size()
is 0.So, I can't reproduce it. One thing I can still suggest is to skip
memcpy
andbuffered.remove_prefix
ifavailable
is 0. It's a bit defensive, but maybememcpy
shouldn't be messed with when there's no data.What do you think?
Oh, this is what you meant by early return...
There's already a test called
exceptionAbortsDownload
. I can add one forexceptionAbortsRead
to show that an EOF exception is also thrown withft->download(...)->read(...)
.I'll make a CL.
Yeah, we could do that or we could just close the bug without doing anything (since any reasonable memcpy should just immediately exit if there is zero length), tbh. Up to you. I don't understand at all the conditions that could have caused it and running your test with a length of zero does nothing interesting either, and gdb confirms there is no null pointer. There may be some breakage in an error path somehow but I really just don't have any idea where that could be or how that could occur, and inspecting the code seems to reveal nothing either.
This issue was mentioned on Gerrit on the following CLs:
https://gerrit.lix.systems/c/lix/+/2045
Yes, I acknowledge that this is defensive, but it doesn't do any harm, so why not.
I think it's good to have a test for the
EndOfFile
exception anyway.