Commit graph

2969 commits

Author SHA1 Message Date
Maximilian Bosch 9b465e7a67 Make "timed out" and "log limit exceeded" builds aborted
In 73694087a0 I gave builds that failed
because of a timeout or exceeded log limit a stop sign and I stand by
that reasoning: with that it's possible to distinguish between actual
build failures and rather transient things such as timeouts.

Back then I considered it a feature that these are shown in a different
tab, but I don't think that's a good idea anymore. When using a jobset to
e.g. track the regressions from a mass rebuild (like a compiler or gcc
update), "Newly failed builds" should exclusively display regressions (and
flaky builds of course, not much I can do about that).

Also, when a bunch of builds fail in such a jobset because of e.g. a
broken connection to a builder that results in a timeout, I want to be
able to restart them all w/o rebuilding actual regressions.

To make it clear that we not only have "Aborted" builds in the tab, I
renamed the label to "Aborted / Timed out".
2024-03-16 22:10:40 +01:00
Maximilian Bosch 9b62c52e5c hydra-queue-runner: drop broken connections from pool
Closes #1336

When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this

    main thread: Lost connection to the database server.
    queue monitor: Lost connection to the database server.

and no more builds being processed.

`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.

If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.

Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.

Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
2024-03-16 22:10:40 +01:00
Maximilian Bosch ef6be80f54 Use submit event in login form
It's a pet peeve from me when logging into my personal Hydra that I
always have to press the button rather than hitting Return after entering
my password.

Reason for that is that the form doesn't have a "submit" button, so far
it was always listened to the "click" event. Submit does that and you
can hit Return alternatively.
2024-03-16 22:10:40 +01:00
Ilya K 969eb3eeac urlencode drv names when fetching logs
Otherwise names with special characters like + break things.
2024-03-16 22:10:40 +01:00
Pierre Bourdon 18466e8326 queue-runner: try larger pipe buffer sizes 2024-03-16 22:10:40 +01:00
ajs124 6ed21490ee lazy-load evaluation errors
Closes #1362
2024-03-16 22:10:40 +01:00
John Ericson b503280256 Add migration to drop non-null constraints 2024-01-26 11:53:58 -05:00
John Ericson 323b556dc8 Minimal CA support
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.

Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
2024-01-26 00:34:58 -05:00
John Ericson fcde5908d8 More CA derivations prep
Again, with care not to change the schema in any way.
2024-01-25 21:32:22 -05:00
John Ericson 083ef46c12
Merge pull request #1344 from delroth/google-popup
web: disable Sign in with Google popup
2024-01-25 16:36:16 -05:00
John Ericson c64eed7d07 Simplify StoreConfig::getDefaultSystemFeatures call
That method is now static.
2024-01-25 15:58:07 -05:00
Pierre Bourdon 6df06b089e
web: disable Sign in with Google popup 2024-01-25 09:27:46 +01:00
John Ericson b1fa6b3aac Use StoreConfig::getDefaultSystemFeatures for default machine config
We have to oddly make a `StoreConfig` subclass to get it, but
https://github.com/NixOS/nix/pull/9848 will fix that.

The purpose of this is to ensure that, absent an explicit config,
`localhost` includes `ca-derivations` and `recursive-nix` if those
experimental features are enabled.

Very much the complement of #1342, the previous PR.
2024-01-24 21:37:13 -05:00
John Ericson 07cb5d1b7c Use nix::ParsedDerivation::getRequiredSystemFeatures()
A slight dedup, and also ensures that floating CA derivations require a
`ca-derivations` experimental feature. This fixes the scheduling issue
that @SuperSandro2000 found.
2024-01-24 21:04:14 -05:00
John Ericson d45e14fd43
Merge pull request #1316 from NixOS/ca-derivations-prep
Prepare for CA derivation support with lower impact changes
2024-01-24 18:12:42 -05:00
John Ericson 70e5469303 Use Nix's Machine type in a mimimal way
This is *just* using the fields from that type, and only where the types
coincide. (There are two fields with different types, `speedFactor` most
interestingly.) No code is reused, so we can be sure that no behavior is
changed.

Once the types are reconciled on the Nix side, then we can start
carefully actually reusing code.

Progress on #1164
2024-01-23 12:18:57 -05:00
John Ericson 2e6ee28f9b Machine -> ::Machine so we don't conflict with Nix's 2024-01-23 11:03:19 -05:00
John Ericson 4e8fbaa3d6 Replace Child with SSHMaster::Connection
Nix defines basically an identical struct for the same purpose, so let's
just use that.
2024-01-23 01:11:46 -05:00
John Ericson 588a0c5269 Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-23 19:19:54 -05:00
John Ericson 75f26f1fc4 Clean up std::optional dereferencing in the queue runner
Instead of doing this partial operation a number of times, assert (with
a comment, get a reference to the thing inside, and use that just once.
(This refactor was done twice, "just once" for each time.)
2023-12-23 19:10:58 -05:00
John Ericson 6e67884ff1 One more queryDerivationOutputMap should use the eval store param 2023-12-11 14:05:18 -05:00
John Ericson a6b6c5a539 Revert query -- those columns don't exist yet! 2023-12-11 12:58:54 -05:00
John Ericson ebfefb9161 Sync up with some changes done to the main CA branch 2023-12-11 12:46:36 -05:00
John Ericson 8783dd53f6 Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-11 12:42:43 -05:00
John Ericson 831021808c
Merge pull request #1318 from obsidiansystems/use-build-result-serialiser
Use factored-out `BuildResult` serializer
2023-12-08 11:25:05 -05:00
John Ericson 2ee0068fdc Do not copy for both stores for now
It has a performance cost, and as the comment says we should be doing
the better solution. We want to land this preparatory change on prod
while the rest is still on staging, so we should just skip it for now.

Skipping it will not affect regular fixed-output and input-addressed
derivations, which are the only ones prod would deal with upon getting
this code.

The main CA derivations support branch will revert this commit so it
still works.
2023-12-07 15:05:03 -05:00
John Ericson 31ea6458ca Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-07 15:01:35 -05:00
John Ericson 6a54ab24e2 Use factored-out BuildResult serializer
For the record, here is the Nix 2.19 version:
https://github.com/NixOS/nix/blob/2.19-maintenance/src/libstore/serve-protocol.cc,
which is what we would initially use.

It is a more complete version of what Hydra has today except for one
thing: it always unconditionally sets the start/stop times.

I think that is correct at the other end seems to unconditionally
measure them, but just to be extra careful, I reproduced the old
behavior of falling back on Hydra's own measurements if `startTime` is
0.

The only difference is that the fallback `stopTime` is now measured from
after the entire `BuildResult` is transferred over the wire, but I think
that should be negligible if it is measurable at all. (And remember,
this is fallback case I already suspect is dead code.)
2023-12-07 02:00:22 -05:00
John Ericson 86cd5e9076 copyClosureTo: Use SubstituteFlag instead of bool
This matches Nix (in the same serialization logic in
`src/libstore/legacy-ssh-store.cc`) and adds clarity.
2023-12-07 00:18:50 -05:00
John Ericson 11f8030b0f Add comment from GitHub about adding to store as code comment 2023-12-06 17:59:25 -05:00
John Ericson 3df8feb3a2 Add TODO about setting null instead of empty string in JSON
An empty string is a sneaky way to avoid hard failures --- things that
expect strings still get strings, but it does conversely open the door
up to soft failures (spooky-action-at-a-distance ones because the string
did not have the expected invariants).

"Fail fast" with null will ultimately make the system more robust, but
force us to fix more things up front, and I don't want to change this
without also fixing those things up front, especially as this commit is
for now just part of the the preparatory PR for which this is dead code.
2023-12-05 11:31:06 -05:00
John Ericson 069b7775c5 hydra-eval-jobs: Ensure we have output path if ca-derivations is disabled
Brought up by @thufschmitt in
https://github.com/NixOS/hydra/pull/1316#discussion_r1415111329 . This
makes this closer to what was originally there --- which just dispatched
off the experimental feature rather than the presence/absense of the
output, too.
2023-12-05 11:26:26 -05:00
John Ericson e3443cd22a Put back nicer copyClosure instead of manual closure + copy
It looks like we accidentally got the old code back, probably after a merge
conflict resolution.
2023-12-04 17:41:11 -05:00
John Ericson 8046ec2668 Remove unused outputHashes variable
This looks like a stray copy paste.
2023-12-04 16:21:56 -05:00
John Ericson 9ba4417940 Prepare for CA derivation support with lower impact changes
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.

The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.

Extracted from #875

Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
2023-12-04 16:14:47 -05:00
John Ericson a5d44b60ea
Merge pull request #1313 from obsidiansystems/split-buildRemote
Split the `buildRemote` function, take 2
2023-12-04 11:37:36 -05:00
John Ericson 363604846a Again, use const in for loop
As requested by @teh. Was lost in merge with master, now added back.
2023-12-04 11:31:05 -05:00
John Ericson 162b538912 Remove unused thisArrow variable 2023-12-04 11:27:39 -05:00
John Ericson 104baef503 Document the connection initialization process 2023-12-04 09:42:04 -05:00
Janne Heß 874fcae1e8
Merge pull request #1301 from delroth/queue-runner-perf
queue-runner: only re-sort runnables by prio once per dispatch cycle
2023-12-04 15:27:14 +01:00
John Ericson 67eeabd518 Merge remote-tracking branch 'upstream/master' into split-buildRemote 2023-12-04 09:12:58 -05:00
John Ericson 622c25e3c4 Sedding prior to merge 2023-12-04 08:56:06 -05:00
John Ericson c922e73c11 Update to Nix 2.19
Flake lock file updates:

• Updated input 'nix':
    'github:NixOS/nix/f5f4de6a550327b4b1a06123c2e450f1b92c73b6' (2023-10-02)
  → 'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27)
2023-11-30 15:26:46 -05:00
John Ericson e172461e55 Use const in for loop
As requested by @teh
2023-11-30 12:19:20 -05:00
John Ericson 0917145622 Make new functions not in header static 2023-11-30 12:19:05 -05:00
John Ericson 2bda7ca642 Further use Machine::Connection to deduplicate 2023-11-30 11:31:58 -05:00
John Ericson 831a2d9bd5 Merge remote-tracking branch 'upstream/master' into split-buildRemote 2023-11-30 11:27:40 -05:00
chayleaf e9da80fff6
support nix 2.18 2023-11-21 18:41:52 +07:00
Janne Heß 8f48e4ddec
Merge pull request #1268 from knedlsepp/fix-mime
Fix MIME types when serving .js and .css to fix rendering of HTML reports
2023-11-17 22:16:27 +01:00
Stig Palmquist 6a5fb9efae Set output length of C::P::Argon2 hashes to 16
Since the default lengths in Crypt::Passphrase::Argon2 changed from 16
to 32 in in 0.009, some tests that expected the passphrase to be
unchanged started failing.
2023-10-20 00:09:28 +02:00