this was a debugging aid from day one that should not have any impact on
build semantics, and if it *does* have an impact on build semantics then
build semantics are seriously broken. keeping the order imposed by these
keys will be impossible once we let a real event loop schedule our jobs.
Change-Id: I5c313324e1f213ab6453d82f41ae5e59de809a5b
without circular references we do not need weak goal pointers except for
caches, which should not prevent goal destructors running. caches though
cannot create circular references even when they keep strong references.
if we removed goals from caches when their work() is fully finished, not
when their destructors are run, we could keep strong pointers in caches.
since we do not gain much from this we keep those pointers weak for now.
Change-Id: I1d4a6850ff5e264443c90eb4531da89f5e97a3a0
have DerivationGoal and its subclasses produce a wrapper promise for
their intermediate results instead, and return this wrapper promise.
Worker already handles promises that do not complete immediately, so
we do not have to duplicate this into an entire result type variant.
Change-Id: Iae8dbf63cfc742afda4d415922a29ac5a3f39348
the new event loop could very occasionally notice that a dependency of
some goal has failed, process the failure, cause the depending goal to
fail accordingly, and in the doing of the latter two steps let further
dependencies that previously have not been reported as failed do their
reporting anyway. in such cases a goal could fail with "1 dependencies
failed", but more than one dependency failure message was shown. we'll
now report the correct number of failed dependency goals in all cases.
Change-Id: I5aa95dcb2db4de4fd5fee8acbf5db833531d81a8
these can be unique rather than shared because shared_ptr has a
converting constructor. preparatory refactor for something else
and not necessary on its own, and the extra allocations we must
do for shared_ptr control blocks isn't usually relevant anyway.
Change-Id: I5391715545240c6ec8e83a031206edafdfc6462f
also gets rid of explicit strong references to dependencies of any goal,
and weak references to dependers as well. those are now only held within
promises representing goal completion and thus independent of the goal's
relation to each other. the weak references to dependers was only needed
for notifications, and that's much better handled entirely by kj itself.
Change-Id: I00d06df9090f8d6336ee4bb0c1313a7052fb016b
now that we have an event loop in the worker we can use it and its
magical execution suspending properties to replace the slot counts
we managed explicitly with semaphores and raii tokens. technically
this would not have needed an event loop base to be doable, but it
is a whole lot easier to wait for a token to be available if there
is a callback mechanism ready for use that doesn't require a whole
damn dedicated abstract method in Goal to work, and specific calls
to that dedicated method strewn all over the worker implementation
Change-Id: I1da7cf386d94e2bbf2dba9b53ff51dbce6a0cff7
with waitForAWhile turned into promised the core functionality of
waitForInput is now merely to let gc run every so often if needed
Change-Id: I68da342bbc1d67653901cf4502dabfa5bc947628
this simplifies waitForInput quite a lot, and at the same time makes
polling less thundering-herd-y. it even fixes early polling wakeups!
Change-Id: I6dfa62ce91729b8880342117d71af5ae33366414
this removes the rather janky did-you-mean-async poll loop we had so
far. sadly kj does not play well with pty file descriptors, so we do
have to add our own async input stream that does not eat pty EIO and
turns it into an exception. that's still a *lot* better than the old
code, and using a real even loop makes everything else easier later.
Change-Id: Idd7e0428c59758602cc530bcad224cd2fed4c15e
Without this, verifying TLS certificates would fail on macOS, as well
as any system that doesn't have a certificate file at /etc/ssl/certs/ca-certificates.crt,
which includes e.g. Fedora.
Change-Id: Iaa2e0e9db3747645b5482c82e3e0e4e8f229f5f9
This is better for privacy and to avoid leaking netrc credentials in a
MITM attack, but also the assumption that we check the hash no longer
holds in some cases (in particular for impure derivations).
Partially reverts 5db358d4d7.
(cherry picked from commit c04bc17a5a0fdcb725a11ef6541f94730112e7b6)
(cherry picked from commit f2f47fa725fc87bfb536de171a2ea81f2789c9fb)
(cherry picked from commit 7b39cd631e0d3c3d238015c6f450c59bbc9cbc5b)
Upstream-PR: https://github.com/NixOS/nix/pull/11585
Change-Id: Ia973420f6098113da05a594d48394ce1fe41fbb9
These stack traces kind of suck for the reasons mentioned on the
CppTrace page here (no symbols for inline functions is a major one):
https://github.com/jeremy-rifkin/cpptrace
I would consider using CppTrace if it were packaged, but to be honest, I
think that the more reasonable option is actually to move entirely to
out-of-process crash handling and symbolization.
The reason for this is that if you want to generate anything of
substance on SIGSEGV or really any deadly signal, you are stuck in
async-signal-safe land, which is not a place to be trying to run a
symbolizer. LLVM does it anyway, probably carefully, and chromium *can*
do it on debug builds but in general uses crashpad:
https://source.chromium.org/chromium/chromium/src/+/main:base/debug/stack_trace_posix.cc;l=974;drc=82dff63dbf9db05e9274e11d9128af7b9f51ceaa;bpv=1;bpt=1
However, some stack traces are better than *no* stack traces when we get
mystery exceptions falling out the bottom of the program. I've also
promoted the path for "mystery exceptions falling out the bottom of the
program" to hard crash and generate a core dump because although there's
been some months since the last one of these, these are nonetheless
always *atrociously* diagnosed.
We can't improve the crash handling further until either we use Crashpad
(which involves more C++ deps, no thanks) or we put in the ostensibly
work in progress Rust minidump infrastructure, in which case we need to
finish full support for Rust in libutil first.
Sample report:
Lix crashed. This is a bug. We would appreciate if you report it at https://git.lix.systems/lix-project/lix/issues with the following information included:
Exception: std::runtime_error: lol
Stack trace:
0# nix::printStackTrace() in /home/jade/lix/lix3/build/src/nix/../libutil/liblixutil.so
1# 0x000073C9862331F2 in /home/jade/lix/lix3/build/src/nix/../libmain/liblixmain.so
2# 0x000073C985F2E21A in /nix/store/p44qan69linp3ii0xrviypsw2j4qdcp2-gcc-13.2.0-lib/lib/libstdc++.so.6
3# 0x000073C985F2E285 in /nix/store/p44qan69linp3ii0xrviypsw2j4qdcp2-gcc-13.2.0-lib/lib/libstdc++.so.6
4# nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) in /home/jade/lix/lix3/build/src/nix/../libmain/liblixmain.so
5# 0x00005CF65B6B048B in /home/jade/lix/lix3/build/src/nix/nix
6# 0x000073C985C8810E in /nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6
7# __libc_start_main in /nix/store/dbcw19dshdwnxdv5q2g6wldj6syyvq7l-glibc-2.39-52/lib/libc.so.6
8# 0x00005CF65B610335 in /home/jade/lix/lix3/build/src/nix/nix
Change-Id: I1a9f6d349b617fd7145a37159b78ecb9382cb4e9
This caused an infinite loop before since it would just keep asking the
underlying source for more data.
In practice this happened because an HTTP server served a
response to a HEAD request (for which curl will not retrieve any body or
call our write callback function) with Content-Encoding: br, leading to
decompressing nothing at all and going into an infinite loop.
This adds a test to make sure none of our compression methods do that
again, as well as just patching the HTTP client to never feed empty data
into a compression algorithm (since they absolutely have the right to
throw CompressionError on unexpectedly-short streams!).
Reported on Matrix: https://matrix.to/#/!lymvtcwDJ7ZA9Npq:lix.systems/$8BWQR_zKxCQDJ40C5NnDo4bQPId3pZ_aoDj2ANP7Itc?via=lix.systems&via=matrix.org&via=tchncs.de
Change-Id: I027566e280f0f569fdb8df40e5ecbf46c211dad1
* Move the extended attribute deletion after the hardlink sanity check. We
shouldn't be removing extended attributes on random files.
* Make the entity owner-writable before attempting to remove extended
attributes, since this operation usually requires write access on the file,
and we shouldn't fail xattr deletion on a file that has been made unwritable
by the builder or a previous canonicalisation pass.
Fixes: #507
Change-Id: I7e6ccb71649185764cd5210f4a4794ee174afea6
Remove the mutable state stuff that assumes that one file is being
written a time. It's true that we don't write multiple files
interleaved, but that mutable state is evil.
Change-Id: Ia1481da48255d901e4b09a9b783e7af44fae8cff
- Rename the listener to not be called a "sink". If it were a "sink" it
would be eating bytes and conform with any of the Nix sink stuff
(maybe FileHandle should be a Sink itself! but that's a later CL's
problem). This is a parser listener.
- Move the RetrieveRegularNARSink thing into store-api.cc, which is its
only usage, and fix it to actually do what it is stated to do: crash
if its invariants are violated.
It's, of course, used to erm, unpack single-file NAR files, generated
via a horrible contraption of sources and sinks that looks like a
plumbing blueprint. Refactoring that is a future task.
- Add a description of the invariants of NARParseVisitor in preparation
of refactoring it.
Change-Id: Ifca1d74d2947204a1f66349772e54dad0743e944
using a proper event loop basis we no longer have to worry about most of
the intricacies of poll(), or platform-dependent replacements for it. we
may even be able to use the event loop and its promise system for all of
our scheduling in the future. we don't do any real async processing yet,
this is just preparation to separate the first such change from the huge
api design difference with the async framework we chose (kj from capnp):
kj::Promise, unlike std::future, doesn't return exceptions unmangled. it
instead wraps any non-kj exception into a kj exception, erasing all type
information and preserving mostly the what() string in the process. this
makes sense in the capnp rpc use case where unrestricted exception types
can't be transferred, and since it moves error handling styles closer to
a world we'd actually like there's no harm in doing it only here for now
Change-Id: I20f888de74d525fb2db36ca30ebba4bcfe9cc838
The JSON serialisation should be declared in the header so that all translation
units can see it when needed, even though it seems that it has not been used
anywhere else so far. Unfortunately, this means we cannot use the
NLOHMANN_JSON_SERIALIZE_ENUM convenience macro, since it uses a slightly
different signature, but the code is not too bad either.
Change-Id: I6e2851b250e0b53114d2fecb8011ff1ea9379d0f
it just makes sense to have it too, rather than just the pass/fail
information we keep so far. once we turn goals into something more
promise-shaped it'll also help detangle the current data flow mess
Change-Id: I915cf04d177cad849ea7a5833215d795326f1946
it doesn't have a purpose except cache priming, which is largely
irrelevant by default (since another code path already runs this
exact query). our store implementations do not benefit that much
from this either, and the more bursty load may indeed harm them.
Change-Id: I1cc12f8c21cede42524317736d5987f1e43fc9c9
updating statistics *immediately* when any counter changes declutters
things somewhat and makes useful status reports less dependent on the
current worker main loop. using callbacks will make it easier to move
the worker loop into kj entirely, using only promises for scheduling.
Change-Id: I695dfa83111b1ec09b1a54cff268f3c1d7743ed6
there's no reason to go through the event loop in these cases. returning
ContinueImmediately here is just a very convoluted way of jumping to the
state we've just set after unwinding one frame of the stack, which never
matters in the cases changed here because there are no live RAII guards.
Change-Id: I7c00948c22e3caf35e934c1a14ffd2d40efc5547
this is not ideal, but it's better than having this stuck in the worker
loop itself. setting ex on all failing goals is not problematic because
only toplevel goals can ever be observable, all the others are ignored.
notably only derivation goals ever set `ex`, substitution goals do not.
Change-Id: I02e2164487b2955df053fef3c8e774d557aa638a
this doesn't serve a great purpose yet except to confine construction of
goals to the stack frame of Worker::run() and its child frames. we don't
need this yet (and the goal constructors remain fully visible), but in a
future change that fully removes the current worker loop we'll need some
way of knowing which goals are top-level goals without passing the goals
themselves around. once that's possible we can remove visible goals as a
concept and rely on build result futures and a scheduler built upon them
Change-Id: Ia73cdeffcfb9ba1ce9d69b702dc0bc637a4c4ce6
whether goal errors are reported via the `ex` member or just printed to
the log depends on whether the goal is a toplevel goal or a dependency.
if goals are aware of this themselves we can move error printing out of
the worker loop, and since a running worker can only be used by running
goals it's totally sufficient to keep a `Worker::running` flag for this
Change-Id: I6b5cbe6eccee1afa5fde80653c4b968554ddd16f
Fixes:
- Identifiers starting with _ are prohibited
- Some driveby header dependency cleaning which wound up with doing some
extra fixups.
- Fucking C style casts, man. C++ made these 1000% worse by letting you
also do memory corruption with them with references.
- Remove casts to Expr * where ExprBlackHole is an incomplete type by
introducing an explicitly-cast eBlackHoleAddr as Expr *.
- An incredibly illegal cast of the text bytes of the StorePath hash
into a size_t directly. You can't DO THAT.
Replaced with actually parsing the hash so we get 100% of the bits
being entropy, then memcpying the start of the hash. If this shows
up in a profile we should just make the hash parser faster with a
lookup table or something sensible like that.
- This horrendous bit of UB which I thankfully slapped a deprecation
warning on, built, and it didn't trigger anywhere so it was dead
code and I just deleted it. But holy crap you *cannot* do that.
inline void mkString(const Symbol & s)
{
mkString(((const std::string &) s).c_str());
}
- Some wrong lints. Lots of wrong macro lints, one wrong
suspicious-sizeof lint triggered by the template being instantiated
with only pointers, but the calculation being correct for both
pointers and not-pointers.
- Exceptions in destructors strike again. I tried to catch the
exceptions that might actually happen rather than all the exceptions
imaginable. We can let the runtime hard-kill it on other exceptions
imo.
Change-Id: I71761620846cba64d66ee7ca231b20c061e69710
this makes WorkResult copyable, and just all around easier to deal with.
in the future we'll need this to let Goal::work() return a promise for a
WorkResult (or even just a Finished) that can be awaited by other goals.
Change-Id: Ic5a1ce04c5a0f8e683bd00a2ed2b77a2e28989c1
this should be done where we're actually trying to build something, not
in the main worker loop that shouldn't have to be aware of such details
Change-Id: I07276740c0e2e5591a8ce4828a4bfc705396527e
This caused an absolute saga which I would not like anyone else to have
to experience. Let's put in a laser targeted error message that
diagnoses this exact problem.
Fixes: #484
Change-Id: I2a79f04aeb4a1b67c10115e5e39501d958836298
I don't know why the AWS sdk disabled it by default. It would be nice
to have test coverage of the s3 store or proxies, but neither currently
exist.
Fixes: #433
Change-Id: If1e76169a3d66dbec2e926af0d0d0eccf983b97b
There have been multiple setting types for paths that are supposed to be
canonicalised, depending on whether zero or one, one, or any number of paths is
to be specified. Naturally, they behaved in slightly different ways in the
code. Simplify things by unifying them and removing special behaviour (mainly
the "multiple paths type can coerce to boolean" thing).
Change-Id: I7c1ce95e9c8e1829a866fb37d679e167811e9705
this can be a proper WorkResult now. childTerminated is unfortunately a
lot more stubborn and won't be made private for quite a while yet. once
we can get rid of the Worker poll loop that *should* be possible though
Change-Id: I2218df202da5cb84e852f6a37e4c20367495b617
we'll need this once we want to pass extra information out of accepting
replies, such as fd sets or possibly even async output reader promises.
Change-Id: I5e2f18cdb80b0d2faf3067703cc18bd263329b3f
don't keep fds open we're not using. currently this does not cause any
problems, but it does increase the size of our fd table needlessly and
in the future, when we have proper async processing, having builderOut
open in the daemon once the hook has been fully started is problematic
Change-Id: I6e7fb773b280b042873103638d3e04272ca1e4fc
this is useless to do on the face of it, but it'll make it easier to
convert the entire output handling to use async io and promises soon
Change-Id: I2d1eb62c4bbf8f57bd558b9599c08710a389b1a8
only DerivationGoal can set the hook to anything at all. it always sets
buildOutFD to something that is not related to fromHook in any way, and
mixing the two would have rather dire consequences for log consistency.
Change-Id: Ida86727fd1cd5e1ecd78f07f3bde330a346658a8
all derivation goals need a log fd of some description. let's save this
single fd in a dedicated pointer field for all subclasses so that later
we have just the one spot to change if we turn this into async promises
Change-Id: If223adf90909247363fb823d751cae34d25d0c0b
we don't need to expose information about how busy a Worker is if the
worker can instead tell its work items whether they are in a slot. in
the future we might use this to not start items waiting for a slot if
no slots are currently available, but that requires more preparation.
Change-Id: Ibe01ac536da7e6d6f80520164117c43e772f9bd9
this is only used to close non-stdio files in derivation sandboxes. we
may as well encode that in its name, drop the unnecessary integer set,
and use close_range to deal with the actual closing of files. not only
is this clearer, it also makes sandbox setup on linux fast by 1ms each
Change-Id: Id90e259a49c7bc896189e76bfbbf6ef2c0bcd3b2
implementing a build hook is pretty much impossible without either being
a nix, or blindly forwarding the important bits of all build requests to
some kind of nix. we've found no uses of build-hook in the wild, and the
build-hook protocol (apart from being entirely undocumented) is not able
to convey any kind of versioning information between hook and daemon. if
we want to upgrade this infrastructure (which we do), this must not stay
Change-Id: I1ec4976a35adf8105b8ca9240b7984f8b91e147e
* changes:
sqlite: add a Use::fromStrNullable
util: implement charptr_cast
tree-wide: fix a pile of lints
refactor: make HashType and Base enum classes for type safety
build: integrate clang-tidy into CI
There were several usages of the raw sqlite primitives along with C
style casts, seemingly because nobody thought to use an optional for
getting a string or NULL.
Let's fix this API given we already *have* a wrapper.
Change-Id: I526cceedc2e356209d8fb62e11b3572282c314e8
This:
- Converts a bunch of C style casts into C++ casts.
- Removes some very silly pointer subtraction code (which is no more or
less busted on i686 than it began)
- Fixes some "technically UB" that never had to be UB in the first
place.
- Makes finally follow the noexcept status of the inner function. Maybe
in the future we should ban the function from not being noexcept, but
that is not today.
- Makes various locally-used exceptions inherit from std::exception.
Change-Id: I22e66972602604989b5e494fd940b93e0e6e9297
This has been causing various seemingly spurious CI failures as well as
some failures on people running tests on beta builds.
lix> ++(nix-collect-garbage-dry-run.sh:20) nix-store --gc --print-dead
lix> ++(nix-collect-garbage-dry-run.sh:20) wc -l
lix> finding garbage collector roots...
lix> error: Listing pid 87261 file descriptors: Undefined error: 0
There is no real way to write a proper test for this, other than to
start a process like the following:
int main(void) {
for (int i = 0; i < 1000; ++i) {
close(i);
}
sleep(10000);
}
and then let Lix's gc look at it.
I have a relatively high confidence this *will* fix the problem since I
have manually confirmed the behaviour of the libproc call is
as-unexpected, and it would perfectly explain the observed symptom.
Fixes: #446
Change-Id: I67669b98377af17895644b3bafdf42fc33abd076
* changes:
tree-wide: fix various lint warnings
flake & doxygen: update tagline
nix flake metadata: print modified dates for input flakes
cli: eat terminal codes from stdout also
Implement forcing CLI colour on, and document it better
manual: fix a syntax error in redirects.js that made it not do anything
misc docs/meson tidying
build: implement clang-tidy using our plugin
The growth of the seccomp filter in 127ee1a101
made its compilation time significant (roughly 10 milliseconds have been
measured on one machine). For this reason, it is now precompiled and cached in
the parent process so that this overhead is not hit for every single build. It
is still not optimal when going through the daemon, because compilation still
happens once per client, but it's better than before and doing it only once for
the entire daemon requires excessive crimes with the current architecture.
Fixes: #461
Change-Id: I2277eaaf6bab9bd74bbbfd9861e52392a54b61a3
This is a preparation for precompiling the filter, which is done separately.
The behaviour should be unchanged for now.
Change-Id: I899aa7242962615949208597aca88913feba1cb8
The seccomp setup code was a huge chunk of conditionally compiled
platform-specific code. For this reason, it is appropriate to move it to the
platform-specific implementation file. Ideally its setup could be moved a bit
to make it happen at the same place as the Darwin restrictions, but that change
is going to be less mechanical.
Change-Id: I496aa3c4fabf34656aba1e32b0089044ab5b99f8
this begins a long and arduous journey to remove all result state from
Goal, to eventually drop the std::enable_shared_from_this base, and to
completely eliminate all unsynchronized modification of states of both
Goal and Worker. by the end of this we will hopefully be able to start
and reap multiple derivation builds in parallel, which should speed up
the process quite a bit (at least for short local builds, others might
not notice a large difference. the build hooks will remain a problem.)
Change-Id: I57dcd9b2cab4636ed4aa24cdec67124fef883345
In the SSH code, the logger was conditionally paused, but unconditionally
resumed. This was fine as long as resuming the logger was idempotent. Starting
with 0dd1d8ca1c, it isn't any more, and the
behaviour of the code in question was missed. Consequently, an assertion
failure is triggered for example when performing builds against an "SSH" store
on localhost. Fix the issue by only resuming the logger when it has actually
been paused.
Fixes: #458
Change-Id: Ib1e4d047744a129f15730b7216f9c9368c2f4211
we still mutate goal state to store the results of any given goal run,
but now we also have that information in Worker and could in theory do
something else with it. we could return a map of goal to goal results,
which would also let us better diagnose failures of subgoals (at all).
Change-Id: I1df956bbd9fa8cc9485fb6df32918d68dda3ff48
this is the first step towards removing all result-related mutation of
Goal state from goal implementations themselves, and into Worker state
instead. once that is done we can treat all non-const Goal fields like
private state of the goal itself, and make threading of goals possible
Change-Id: I69ff7d02a6fd91a65887c6640bfc4f5fb785b45c
once goals run on multiple threads these fields must by synchronized as
one, or we try to run build hooks to often (or worse, not often enough)
Change-Id: I47860e46fe5c6db41755b2a3a1d9dbb5701c4ca4
there are no other uses for this yet, but asking for just a subset of
outputs does seem at least somewhat useful to have as a generic thing
Change-Id: I30ff5055a666c351b1b086b8d05b9d7c9fb1c77a
limiting CA substitutions was a rather recent addition, and it used a
dedicated counter to not interfere with regular substitutions. though
this works fine it somewhat contradicts the documentation; job limits
should apply to all kinds of substitutions, or be one limit for each.
Change-Id: I1505105b14260ecc1784039b2cc4b7afcf9115c8
all goals do this. it makes no sense to not notify a goal of EOF
conditions because this is the universal signal for "child done"
Change-Id: Ic3980de312547e616739c57c6248a8e81308b5ee
just update progress every time a goal has returned from work(). there
seem to be no performance penalties, and the code is much simpler now.
Change-Id: I288ee568b764ee61f40a498d986afda49987cb50
bindPath/doBind is a useful function in build that is used in several
parts of LocalDerivationGoal. Moving this function makes it easier to
split LocalDerivationGoal implementation between several files.
Change-Id: Ic5a0768479c153c1aa3ed425f12604b20bbf0f42