lix/src/libutil/serialise.hh
Jade Lovelace 04f8a14833
tree-wide: shuffle headers around for about 30s compile time
This didn't really feel so worth it afterwards, but I did untangle a
bunch of stuff that should not have been tangled.

The general gist of this change is that variant bullshit was causing a
bunch of compile time, and it seems like the only way to deal with
variant induced compile time is to keep variant types out of headers.
Explicit template instantiation seems to do nothing for them.

I also seem to have gotten some back-end time improvement from
explicitly instantiating regex, but I don't know why. There is no
corresponding front-end time improvement from it: regex is still at the
top of the sinners list.

**** Templates that took longest to instantiate:
 15231 ms: std::basic_regex<char>::_M_compile (28 times, avg 543 ms)
 15066 ms: std::__detail::_Compiler<std::regex_traits<char>>::_Compiler (28 times, avg 538 ms)
 12571 ms: std::__detail::_Compiler<std::regex_traits<char>>::_M_disjunction (28 times, avg 448 ms)
 12454 ms: std::__detail::_Compiler<std::regex_traits<char>>::_M_alternative (28 times, avg 444 ms)
 12225 ms: std::__detail::_Compiler<std::regex_traits<char>>::_M_term (28 times, avg 436 ms)
 11363 ms: nlohmann::basic_json<>::parse<const char *> (21 times, avg 541 ms)
 10628 ms: nlohmann::basic_json<>::basic_json (109 times, avg 97 ms)
 10134 ms: std::__detail::_Compiler<std::regex_traits<char>>::_M_atom (28 times, avg 361 ms)

Back-end time before messing with the regex:
**** Function sets that took longest to compile / optimize:
  8076 ms: void boost::io::detail::put<$>(boost::io::detail::put_holder<$> cons... (177 times, avg 45 ms)
  4382 ms: std::_Rb_tree<$>::_M_erase(std::_Rb_tree_node<$>*) (1247 times, avg 3 ms)
  3137 ms: boost::stacktrace::detail::to_string_impl_base<boost::stacktrace::de... (137 times, avg 22 ms)
  2896 ms: void boost::io::detail::mk_str<$>(std::__cxx11::basic_string<$>&, ch... (177 times, avg 16 ms)
  2304 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_... (210 times, avg 10 ms)
  2116 ms: bool std::__detail::_Compiler<$>::_M_expression_term<$>(std::__detai... (112 times, avg 18 ms)
  2051 ms: std::_Rb_tree_iterator<$> std::_Rb_tree<$>::_M_emplace_hint_unique<$... (244 times, avg 8 ms)
  2037 ms: toml::result<$> toml::detail::sequence<$>::invoke<$>(toml::detail::l... (93 times, avg 21 ms)
  1928 ms: std::__detail::_Compiler<$>::_M_quantifier() (28 times, avg 68 ms)
  1859 ms: nlohmann::json_abi_v3_11_3::detail::serializer<$>::dump(nlohmann::js... (41 times, avg 45 ms)
  1824 ms: std::_Function_handler<$>::_M_manager(std::_Any_data&, std::_Any_dat... (973 times, avg 1 ms)
  1810 ms: std::__detail::_BracketMatcher<$>::_BracketMatcher(std::__detail::_B... (112 times, avg 16 ms)
  1793 ms: nix::fetchers::GitInputScheme::fetch(nix::ref<$>, nix::fetchers::Inp... (1 times, avg 1793 ms)
  1759 ms: std::_Rb_tree<$>::_M_get_insert_unique_pos(std::__cxx11::basic_strin... (281 times, avg 6 ms)
  1722 ms: bool nlohmann::json_abi_v3_11_3::detail::parser<$>::sax_parse_intern... (19 times, avg 90 ms)
  1677 ms: boost::io::basic_altstringbuf<$>::overflow(int) (194 times, avg 8 ms)
  1674 ms: std::__cxx11::basic_string<$>::_M_mutate(unsigned long, unsigned lon... (249 times, avg 6 ms)
  1660 ms: std::_Rb_tree_node<$>* std::_Rb_tree<$>::_M_copy<$>(std::_Rb_tree_no... (304 times, avg 5 ms)
  1599 ms: bool nlohmann::json_abi_v3_11_3::detail::parser<$>::sax_parse_intern... (19 times, avg 84 ms)
  1568 ms: void std::__detail::_Compiler<$>::_M_insert_bracket_matcher<$>(bool) (112 times, avg 14 ms)
  1541 ms: std::__shared_ptr<$>::~__shared_ptr() (531 times, avg 2 ms)
  1539 ms: nlohmann::json_abi_v3_11_3::detail::serializer<$>::dump_escaped(std:... (41 times, avg 37 ms)
  1471 ms: void std::__detail::_Compiler<$>::_M_insert_character_class_matcher<... (112 times, avg 13 ms)

After messing with the regex (notice std::__detail::_Compiler vanishes
here, but I don't know why):

**** Function sets that took longest to compile / optimize:
  8054 ms: void boost::io::detail::put<$>(boost::io::detail::put_holder<$> cons... (177 times, avg 45 ms)
  4313 ms: std::_Rb_tree<$>::_M_erase(std::_Rb_tree_node<$>*) (1217 times, avg 3 ms)
  3259 ms: boost::stacktrace::detail::to_string_impl_base<boost::stacktrace::de... (137 times, avg 23 ms)
  3045 ms: void boost::io::detail::mk_str<$>(std::__cxx11::basic_string<$>&, ch... (177 times, avg 17 ms)
  2314 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_... (207 times, avg 11 ms)
  1923 ms: std::_Rb_tree_iterator<$> std::_Rb_tree<$>::_M_emplace_hint_unique<$... (216 times, avg 8 ms)
  1817 ms: bool nlohmann::json_abi_v3_11_3::detail::parser<$>::sax_parse_intern... (18 times, avg 100 ms)
  1816 ms: toml::result<$> toml::detail::sequence<$>::invoke<$>(toml::detail::l... (93 times, avg 19 ms)
  1788 ms: nlohmann::json_abi_v3_11_3::detail::serializer<$>::dump(nlohmann::js... (40 times, avg 44 ms)
  1749 ms: std::_Rb_tree<$>::_M_get_insert_unique_pos(std::__cxx11::basic_strin... (278 times, avg 6 ms)
  1724 ms: std::__cxx11::basic_string<$>::_M_mutate(unsigned long, unsigned lon... (248 times, avg 6 ms)
  1697 ms: boost::io::basic_altstringbuf<$>::overflow(int) (194 times, avg 8 ms)
  1684 ms: nix::fetchers::GitInputScheme::fetch(nix::ref<$>, nix::fetchers::Inp... (1 times, avg 1684 ms)
  1680 ms: std::_Rb_tree_node<$>* std::_Rb_tree<$>::_M_copy<$>(std::_Rb_tree_no... (303 times, avg 5 ms)
  1589 ms: bool nlohmann::json_abi_v3_11_3::detail::parser<$>::sax_parse_intern... (18 times, avg 88 ms)
  1483 ms: non-virtual thunk to boost::wrapexcept<$>::~wrapexcept() (181 times, avg 8 ms)
  1447 ms: nlohmann::json_abi_v3_11_3::detail::serializer<$>::dump_escaped(std:... (40 times, avg 36 ms)
  1441 ms: std::__shared_ptr<$>::~__shared_ptr() (496 times, avg 2 ms)
  1420 ms: boost::stacktrace::basic_stacktrace<$>::init(unsigned long, unsigned... (137 times, avg 10 ms)
  1396 ms: boost::basic_format<$>::~basic_format() (194 times, avg 7 ms)
  1290 ms: std::__cxx11::basic_string<$>::_M_replace_cold(char*, unsigned long,... (231 times, avg 5 ms)
  1258 ms: std::vector<$>::~vector() (354 times, avg 3 ms)
  1222 ms: std::__cxx11::basic_string<$>::_M_replace(unsigned long, unsigned lo... (231 times, avg 5 ms)
  1194 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos(std::_Rb_tree_const_... (49 times, avg 24 ms)
  1186 ms: bool tao::pegtl::internal::sor<$>::match<$>(std::integer_sequence<$>... (1 times, avg 1186 ms)
  1149 ms: std::__detail::_Executor<$>::_M_dfs(std::__detail::_Executor<$>::_Ma... (70 times, avg 16 ms)
  1123 ms: toml::detail::sequence<$>::invoke(toml::detail::location&) (69 times, avg 16 ms)
  1110 ms: nlohmann::json_abi_v3_11_3::basic_json<$>::json_value::destroy(nlohm... (55 times, avg 20 ms)
  1079 ms: std::_Function_handler<$>::_M_manager(std::_Any_data&, std::_Any_dat... (541 times, avg 1 ms)
  1033 ms: nlohmann::json_abi_v3_11_3::detail::lexer<$>::scan_number() (20 times, avg 51 ms)

Change-Id: I10af282bcd4fc39c2d3caae3453e599e4639c70b
2024-08-28 09:55:05 -07:00

617 lines
14 KiB
C++
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#pragma once
///@file
#include <memory>
#include "charptr-cast.hh"
#include "generator.hh"
#include "types.hh"
#include "file-descriptor.hh"
namespace nix {
/**
* Abstract destination of binary data.
*/
struct Sink
{
virtual ~Sink() { }
virtual void operator () (std::string_view data) = 0;
virtual bool good() { return true; }
};
/**
* Just throws away data.
*/
struct NullSink : Sink
{
void operator () (std::string_view data) override
{ }
};
struct FinishSink : virtual Sink
{
virtual void finish() = 0;
};
/**
* A buffered abstract sink. Warning: a BufferedSink should not be
* used from multiple threads concurrently.
*/
struct BufferedSink : virtual Sink
{
size_t bufSize, bufPos;
std::unique_ptr<char[]> buffer;
BufferedSink(size_t bufSize = 32 * 1024)
: bufSize(bufSize), bufPos(0), buffer(nullptr) { }
void operator () (std::string_view data) override;
void flush();
protected:
virtual void writeUnbuffered(std::string_view data) = 0;
};
/**
* Abstract source of binary data.
*/
struct Source
{
virtual ~Source() { }
/**
* Store exactly len bytes in the buffer pointed to by data.
* It blocks until all the requested data is available, or throws
* an error if it is not going to be available.
*/
void operator () (char * data, size_t len);
/**
* Store up to len in the buffer pointed to by data, and
* return the number of bytes stored. It blocks until at least
* one byte is available.
*/
virtual size_t read(char * data, size_t len) = 0;
virtual bool good() { return true; }
void drainInto(Sink & sink);
std::string drain();
};
/**
* A buffered abstract source. Warning: a BufferedSource should not be
* used from multiple threads concurrently.
*/
struct BufferedSource : Source
{
size_t bufSize, bufPosIn, bufPosOut;
std::unique_ptr<char[]> buffer;
BufferedSource(size_t bufSize = 32 * 1024)
: bufSize(bufSize), bufPosIn(0), bufPosOut(0), buffer(nullptr) { }
size_t read(char * data, size_t len) override;
bool hasData();
protected:
/**
* Underlying read call, to be overridden.
*/
virtual size_t readUnbuffered(char * data, size_t len) = 0;
};
/**
* A sink that writes data to a file descriptor.
*/
struct FdSink : BufferedSink
{
int fd;
size_t written = 0;
FdSink() : fd(-1) { }
FdSink(int fd) : fd(fd) { }
FdSink(FdSink&&) = default;
FdSink & operator=(FdSink && s)
{
flush();
fd = s.fd;
s.fd = -1;
written = s.written;
return *this;
}
~FdSink();
void writeUnbuffered(std::string_view data) override;
bool good() override;
private:
bool _good = true;
};
/**
* A source that reads data from a file descriptor.
*/
struct FdSource : BufferedSource
{
int fd;
size_t read = 0;
/** Defaults to "unexpected end-of-file" */
std::optional<std::string> specialEndOfFileError;
std::string endOfFileError() const;
FdSource() : fd(-1) { }
FdSource(int fd) : fd(fd) { }
FdSource(FdSource &&) = default;
FdSource & operator=(FdSource && s)
{
fd = s.fd;
s.fd = -1;
read = s.read;
return *this;
}
bool good() override;
protected:
size_t readUnbuffered(char * data, size_t len) override;
private:
bool _good = true;
};
/**
* A sink that writes data to a string.
*/
struct StringSink : Sink
{
std::string s;
StringSink() { }
explicit StringSink(const size_t reservedSize)
{
s.reserve(reservedSize);
};
StringSink(std::string && s) : s(std::move(s)) { };
void operator () (std::string_view data) override;
};
/**
* A source that reads data from a string.
*/
struct StringSource : Source
{
std::string_view s;
size_t pos;
StringSource(std::string_view s) : s(s), pos(0) { }
size_t read(char * data, size_t len) override;
};
/**
* A sink that writes all incoming data to two other sinks.
*/
struct TeeSink : Sink
{
Sink & sink1, & sink2;
TeeSink(Sink & sink1, Sink & sink2) : sink1(sink1), sink2(sink2) { }
virtual void operator () (std::string_view data) override
{
sink1(data);
sink2(data);
}
};
/**
* Adapter class of a Source that saves all data read to a sink.
*/
struct TeeSource : Source
{
Source & orig;
Sink & sink;
TeeSource(Source & orig, Sink & sink)
: orig(orig), sink(sink) { }
size_t read(char * data, size_t len) override
{
size_t n = orig.read(data, len);
sink({data, n});
return n;
}
};
/**
* A reader that consumes the original Source until 'size'.
*/
struct SizedSource : Source
{
Source & orig;
size_t remain;
SizedSource(Source & orig, size_t size)
: orig(orig), remain(size) { }
size_t read(char * data, size_t len) override
{
if (this->remain <= 0) {
throw EndOfFile("sized: unexpected end-of-file");
}
len = std::min(len, this->remain);
size_t n = this->orig.read(data, len);
this->remain -= n;
return n;
}
/**
* Consume the original source until no remain data is left to consume.
*/
size_t drainAll()
{
std::vector<char> buf(8192);
size_t sum = 0;
while (this->remain > 0) {
size_t n = read(buf.data(), buf.size());
sum += n;
}
return sum;
}
};
/**
* A sink that that just counts the number of bytes given to it
*/
struct LengthSink : Sink
{
uint64_t length = 0;
void operator () (std::string_view data) override
{
length += data.size();
}
};
/**
* Convert a function into a sink.
*/
struct LambdaSink : Sink
{
typedef std::function<void(std::string_view data)> lambda_t;
lambda_t lambda;
LambdaSink(const lambda_t & lambda) : lambda(lambda) { }
void operator () (std::string_view data) override
{
lambda(data);
}
};
/**
* Convert a function into a source.
*/
struct LambdaSource : Source
{
typedef std::function<size_t(char *, size_t)> lambda_t;
lambda_t lambda;
LambdaSource(const lambda_t & lambda) : lambda(lambda) { }
size_t read(char * data, size_t len) override
{
return lambda(data, len);
}
};
/**
* Chain two sources together so after the first is exhausted, the second is
* used
*/
struct ChainSource : Source
{
Source & source1, & source2;
bool useSecond = false;
ChainSource(Source & s1, Source & s2)
: source1(s1), source2(s2)
{ }
size_t read(char * data, size_t len) override;
};
struct GeneratorSource : Source
{
GeneratorSource(Generator<Bytes> && g) : g(std::move(g)) {}
virtual size_t read(char * data, size_t len) override
{
// we explicitly do not poll the generator multiple times to fill the
// buffer, only to produce some output at all. this is allowed by the
// semantics of read(), only operator() must fill the buffer entirely
while (!buf.size()) {
if (auto next = g.next()) {
buf = *next;
} else {
throw EndOfFile("coroutine has finished");
}
}
len = std::min(len, buf.size());
memcpy(data, buf.data(), len);
buf = buf.subspan(len);
return len;
}
private:
Generator<Bytes> g;
Bytes buf{};
};
inline Sink & operator<<(Sink & sink, Generator<Bytes> && g)
{
while (auto buffer = g.next()) {
sink(std::string_view(buffer->data(), buffer->size()));
}
return sink;
}
struct SerializingTransform;
using WireFormatGenerator = Generator<Bytes, SerializingTransform>;
struct SerializingTransform
{
std::array<unsigned char, 8> buf;
Bytes operator()(uint64_t n)
{
buf[0] = n & 0xff;
buf[1] = (n >> 8) & 0xff;
buf[2] = (n >> 16) & 0xff;
buf[3] = (n >> 24) & 0xff;
buf[4] = (n >> 32) & 0xff;
buf[5] = (n >> 40) & 0xff;
buf[6] = (n >> 48) & 0xff;
buf[7] = (unsigned char) (n >> 56) & 0xff;
return {charptr_cast<const char *>(buf.begin()), 8};
}
static Bytes padding(size_t unpadded)
{
return Bytes("\0\0\0\0\0\0\0", unpadded % 8 ? 8 - unpadded % 8 : 0);
}
// opt in to generator chaining. without this co_yielding
// another generator of any type will cause a type error.
auto operator()(Generator<Bytes> && g)
{
return std::move(g);
}
// only choose this for *exactly* char spans, do not allow implicit
// conversions. this would cause ambiguities with strings literals,
// and resolving those with more string-like overloads needs a lot.
template<typename Span>
requires std::same_as<Span, std::span<char>> || std::same_as<Span, std::span<const char>>
Bytes operator()(Span s)
{
return s;
}
WireFormatGenerator operator()(std::string_view s);
WireFormatGenerator operator()(const Strings & s);
WireFormatGenerator operator()(const StringSet & s);
WireFormatGenerator operator()(const Error & s);
};
void writePadding(size_t len, Sink & sink);
// NOLINTBEGIN(cppcoreguidelines-avoid-capturing-lambda-coroutines):
// These coroutines do their entire job before the semicolon and are not
// retained, so they live long enough.
inline Sink & operator<<(Sink & sink, uint64_t u)
{
return sink << [&]() -> WireFormatGenerator { co_yield u; }();
}
inline Sink & operator<<(Sink & sink, std::string_view s)
{
return sink << [&]() -> WireFormatGenerator { co_yield s; }();
}
inline Sink & operator<<(Sink & sink, const Strings & s)
{
return sink << [&]() -> WireFormatGenerator { co_yield s; }();
}
inline Sink & operator<<(Sink & sink, const StringSet & s)
{
return sink << [&]() -> WireFormatGenerator { co_yield s; }();
}
inline Sink & operator<<(Sink & sink, const Error & ex)
{
return sink << [&]() -> WireFormatGenerator { co_yield ex; }();
}
// NOLINTEND(cppcoreguidelines-avoid-capturing-lambda-coroutines)
MakeError(SerialisationError, Error);
template<typename T>
T readNum(Source & source);
inline unsigned int readInt(Source & source)
{
return readNum<unsigned int>(source);
}
inline uint64_t readLongLong(Source & source)
{
return readNum<uint64_t>(source);
}
void readPadding(size_t len, Source & source);
size_t readString(char * buf, size_t max, Source & source);
std::string readString(Source & source, size_t max = std::numeric_limits<size_t>::max());
template<class T> T readStrings(Source & source);
Source & operator >> (Source & in, std::string & s);
template<typename T>
Source & operator >> (Source & in, T & n)
{
n = readNum<T>(in);
return in;
}
template<typename T>
Source & operator >> (Source & in, bool & b)
{
b = readNum<uint64_t>(in);
return in;
}
Error readError(Source & source);
/**
* An adapter that converts a std::basic_istream into a source.
*/
struct StreamToSourceAdapter : Source
{
std::shared_ptr<std::basic_istream<char>> istream;
StreamToSourceAdapter(std::shared_ptr<std::basic_istream<char>> istream)
: istream(istream)
{ }
size_t read(char * data, size_t len) override
{
if (!istream->read(data, len)) {
if (istream->eof()) {
if (istream->gcount() == 0)
throw EndOfFile("end of file");
} else
throw Error("I/O error in StreamToSourceAdapter");
}
return istream->gcount();
}
};
/**
* A source that reads a distinct format of concatenated chunks back into its
* logical form, in order to guarantee a known state to the original stream,
* even in the event of errors.
*
* Use with FramedSink, which also allows the logical stream to be terminated
* in the event of an exception.
*/
struct FramedSource : Source
{
Source & from;
bool eof = false;
std::vector<char> pending;
size_t pos = 0;
FramedSource(Source & from) : from(from)
{ }
~FramedSource()
{
try {
if (!eof) {
while (true) {
auto n = readInt(from);
if (!n) break;
std::vector<char> data(n);
from(data.data(), n);
}
}
} catch (...) {
ignoreException();
}
}
size_t read(char * data, size_t len) override
{
if (eof) throw EndOfFile("reached end of FramedSource");
if (pos >= pending.size()) {
size_t len = readInt(from);
if (!len) {
eof = true;
return 0;
}
pending = std::vector<char>(len);
pos = 0;
from(pending.data(), len);
}
auto n = std::min(len, pending.size() - pos);
memcpy(data, pending.data() + pos, n);
pos += n;
return n;
}
};
/**
* Write as chunks in the format expected by FramedSource.
*
* The exception_ptr reference can be used to terminate the stream when you
* detect that an error has occurred on the remote end.
*/
struct FramedSink : nix::BufferedSink
{
BufferedSink & to;
std::exception_ptr & ex;
FramedSink(BufferedSink & to, std::exception_ptr & ex) : to(to), ex(ex)
{ }
~FramedSink()
{
try {
to << 0;
to.flush();
} catch (...) {
ignoreException();
}
}
void writeUnbuffered(std::string_view data) override
{
/* Don't send more data if the remote has
encountered an error. */
if (ex) {
auto ex2 = ex;
ex = nullptr;
std::rethrow_exception(ex2);
}
to << data.size();
to(data);
};
};
/* Disabling GC when entering a coroutine (without the boehm patch).
mutable to avoid boehm gc dependency in libutil.
*/
extern std::shared_ptr<void> (*create_coro_gc_hook)();
}