Commit graph

3021 commits

Author SHA1 Message Date
988554eb7a chore: apply lix include-rearrangement to hydra
This complies with the new include layout in Lix, which will eventually
replace the legacy one previously in use.
2024-11-25 23:08:32 -08:00
acd54bfbd6
Update all flake inputs, fix build with latest Lix
Flake lock file updates:

• Updated input 'lix':
    'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=ed9b7f4f84fd60ad8618645cc1bae2d686ff0db6' (2024-10-05)
  → 'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=66f6dbda32959dd5cf3a9aaba15af72d037ab7ff' (2024-11-20)
• Updated input 'lix/nix2container':
    'github:nlewo/nix2container/3853e5caf9ad24103b13aa6e0e8bcebb47649fe4' (2024-07-10)
  → 'github:nlewo/nix2container/fa6bb0a1159f55d071ba99331355955ae30b3401' (2024-08-30)
• Updated input 'lix/pre-commit-hooks':
    'github:cachix/git-hooks.nix/f451c19376071a90d8c58ab1a953c6e9840527fd' (2024-07-15)
  → 'github:cachix/git-hooks.nix/4e743a6920eab45e8ba0fbe49dc459f1423a4b74' (2024-09-19)
• Updated input 'nix-eval-jobs':
    'git+https://git.lix.systems/lix-project/nix-eval-jobs?ref=refs/heads/main&rev=42a160bce2fd9ffebc3809746bc80cc7208f9b08' (2024-08-13)
  → 'git+https://git.lix.systems/lix-project/nix-eval-jobs?ref=refs/heads/main&rev=912a9d63319e71ca131e16eea3348145a255db2e' (2024-11-18)
• Updated input 'nix-eval-jobs/flake-parts':
    'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d' (2024-08-01)
  → 'github:hercules-ci/flake-parts/506278e768c2a08bec68eb62932193e341f55c90' (2024-11-01)
• Updated input 'nix-eval-jobs/nix-github-actions':
    'github:nix-community/nix-github-actions/622f829f5fe69310a866c8a6cd07e747c44ef820' (2024-07-04)
  → 'github:nix-community/nix-github-actions/e04df33f62cdcf93d73e9a04142464753a16db67' (2024-10-24)
• Updated input 'nix-eval-jobs/treefmt-nix':
    'github:numtide/treefmt-nix/349de7bc435bdff37785c2466f054ed1766173be' (2024-08-12)
  → 'github:numtide/treefmt-nix/746901bb8dba96d154b66492a29f5db0693dbfcc' (2024-10-30)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/ecbc1ca8ffd6aea8372ad16be9ebbb39889e55b6' (2024-10-06)
  → 'github:NixOS/nixpkgs/e8c38b73aeb218e27163376a2d617e61a2ad9b59' (2024-11-16)
2024-11-23 10:58:14 +01:00
a4b2b58e2b
update for lix header change 2024-11-17 19:57:36 -05:00
ee1234c15c ignoreException has been split into two
The Finally part is a destructor, so using `ignoreExceptionInDestructor`
seems to be the correct choice here.
2024-10-07 19:22:32 +02:00
7c7078cccf Fix build with latest Lix
Since ca1dc3f70bf98e2424b7b2666ee2180675b67451, the NAR parser has moved
the preallocate & receive steps into the file handle class to remove the
assumption that only one file can be handled at a time.
2024-10-07 19:22:32 +02:00
f23ec71227 Add metric for builds waiting for download slot 2024-10-01 19:14:24 +03:00
6a88e647e7
flake.lock: Update; fix build
Flake lock file updates:

• Updated input 'lix':
    'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=278fddc317cf0cf4d3602d0ec0f24d1dd281fadb' (2024-08-17)
  → 'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=02eb07cfd539c34c080cb1baf042e5e780c1fcc2' (2024-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/c3d4ac725177c030b1e289015989da2ad9d56af0' (2024-08-15)
  → 'github:NixOS/nixpkgs/6e99f2a27d600612004fbd2c3282d614bfee6421' (2024-08-30)
2024-09-02 10:53:46 +02:00
8d5d4942e1
queue-runner: remove unused method from State 2024-08-27 02:57:37 +02:00
e5a8ee5c17
web: require permissions for /api/push 2024-08-27 02:57:16 +02:00
fd7fd0ad65
treewide: clang-tidy modernize 2024-08-27 01:33:12 +02:00
d3fcedbcf5
treewide: enable clang-tidy bugprone findings
Fix some trivial findings throughout the codebase, mostly making
implicit casts explicit.
2024-08-27 00:43:17 +02:00
3891ad77e3
queue-runner: change Machine object creation to work around clang bug
https://github.com/llvm/llvm-project/issues/106123
2024-08-26 22:34:48 +02:00
ab6d81fad4
api: fix github webhook 2024-08-26 20:26:21 +02:00
Sandro
64df0cba47
Match URIs that don't end in .git
Co-authored-by: Charlotte <lotte@chir.rs>
2024-08-26 20:26:21 +02:00
Sandro Jäckel
6179b298cb
Add gitea push hook 2024-08-26 20:26:20 +02:00
44b9a7b95d
queue-runner: handle broken pg pool connections in builder code
Completes 9b62c52e5c with another location
that was initially missed.
2024-08-25 22:05:13 +02:00
3ee51dbe58 readIntoSocket: fix with store URIs containing an &
The third argument to `open()` in `-|` mode is passed to a shell if it's
a string. In my case the store URI contains
`?secret-key=${signingKey.directory}/secret&compression=zstd`

For the `nix store cat` case this means that

* until `&` the process will be started in the background. This fails
  immediately because no path to cat is specified.
* `compression=zstd` is a variable assignment
* the `$path` argument to `store cat` is attempted to be executed as
  another command

Passing just the list solves the problem.
2024-08-18 21:41:54 +00:00
e987f74954
doc: drop dev-notes & make update-dbix more discoverable
`dev-notes` are severely outdated. I dropped everything except one note
that I moved to hacking.md. The parts about creating users are also
covered elsewhere.

The `update-dbix` part got a just command to make it discoverable again.
2024-08-18 14:47:09 +02:00
459aa0a598 Stream files from store instead of buffering them
When an artifact is requested from hydra the output is first copied
from the nix store into memory and then sent as a response, delaying
the download and taking up significant amounts of memory.

As reported in https://github.com/NixOS/hydra/issues/1357

Instead of calling a command and blocking while reading in the entire
output, this adds read_into_socket(). the function takes a
command, starting a subprocess with that command, returning a file
descriptor attached to stdout.
This file descriptor is then by responsebuilder of Catalyst to steam
the output directly
2024-08-13 22:09:48 +02:00
f1b552ecbf update flake locks, fix compile errors 2024-08-12 22:45:34 +02:00
4b107e6ff3
hydra-eval-jobset: pass --workers and --max-memory-size to n-e-j
Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).
2024-07-22 23:16:29 +02:00
4b886d9c45
autotools -> meson
There are some known regressions regarding local testing setups - since
everything was kinda half written with the expectation that build dir =
source dir (which should not be true anymore). But everything builds and
the test suite runs fine, after several hours spent debugging random
crashes in libpqxx with MALLOC_PERTURB_...
2024-07-22 22:30:41 +02:00
fbb894af4e
static: de-bundle vendored dependencies
The current way this whole build works is incompatible with having a
separate build dir, or at least with having a separate build dir. To be
improved in the future - maybe minimize the dependencies a bit. But this
isn't so much data that we really have to care.
2024-07-22 16:30:13 +02:00
Niklas Hambüchen
8a984efaef
renderInputDiff: Increase git hash length 8 -> 12
See investigation on lengths required to be conflict-free in practice:

https://github.com/NixOS/hydra/pull/1258#issuecomment-1321891677
2024-07-21 12:23:29 +02:00
abc9f11417
queue runner: fix store URI args being written to the SSH hosts file 2024-07-20 16:09:07 +02:00
9a4a5dd624
jobset-eval: fix actions not showing up sometimes for new jobs
New jobs have their "new" status take precedence over them being
"failed" or "queued", which means actions that can act on "failed" or
"queued" jobs weren't shown to the user when they could only act on
"new" jobs.
2024-07-20 13:09:39 +02:00
b0e9b4b2f9
hydra-eval-jobset: incrementally ingest eval results
nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:

1. Use less RAM by avoiding buffering a whole eval's worth of metadata
   into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
   ingesting builds faster.
2024-07-17 12:05:41 +02:00
370a4bf138
treewide: start removing tests related to constituents
The feature cannot easily be ported to nix-eval-jobs since it requires
deep integration into the evaluator, and h.n.o doesn't use it. Later
more of this will be ripped out.
2024-07-17 08:31:19 +02:00
ed7c58708c
hydra-eval-jobs: remove, replaced by nix-eval-jobs 2024-07-17 08:31:19 +02:00
6d4ccff43c
hydra-eval-jobset: use nix-eval-jobs instead of hydra-eval-jobs 2024-07-17 08:31:19 +02:00
6195cec6a3
hydra-queue-runner: adjust for Lix generators related changes 2024-07-16 04:35:44 +02:00
fb9e29d4d0
queue runner: fix nullptr deref on build exception after releasing a machine reservation 2024-07-13 06:12:35 +02:00
a9a2679793
hydra-evaluator: fix regression from e9d0a3 (inverted assertion) 2024-06-24 21:41:40 +02:00
e9d0a3a754
Update to latest Lix main 2024-06-24 20:25:35 +02:00
cbe527a3ee
util.hh split 2024-06-11 11:27:43 -04:00
ca98f42b39
nixexpr -> lixexpr 2024-06-11 11:13:42 -04:00
aff354e32f
Don't send gitea status update when build is started
This was the source of a flaky test because sometimes hydra-notify was
quick enough to send out `buildStarted` and sometimes it apparently
wasn't which was quickly spottable with `nix build --rebuild`.

Removing that status update doesn't make a difference functionally,
gitea doesn't differentiate between "queued" and "running", so we send
the same status ("pending") out on both events, so we'd even safe one
avoidable request.

(cherry picked from commit 806c375c338b4e6a1d276b96994018908784bf11)
2024-06-10 17:40:02 +02:00
a053ef8fdf
lix api changes 2024-05-10 15:00:54 -04:00
803b8ee731
Revert "Update to Nix 2.19"
This reverts commit c922e73c11.
2024-05-10 14:47:11 -04:00
b8d03adaf4
queue runner: attempt at slightly smarter scheduling criteria
Instead of just going for "whatever is the oldest build we know of",
use the following first:

- Is the step more constrained? If so, schedule it first to avoid
  filling up "more desirable" build slots with less constrained builds.

- Does the step have more dependents? If so, schedule it first to try
  and maximize open parallelism and breadth of scheduling options.
2024-04-21 17:36:16 +02:00
ee1a7a7813
web: serveFile: also serve a CSP putting served HTML in its own origin 2024-04-21 16:14:24 +02:00
5c3e508e55
queue-runner: release machine reservation while copying outputs
This allows for better builder usage when the queue runner is busy. To
avoid running into uncontrollable imbalances between builder/queue
runner, we only release the machine reservation after the local
throttler has found a slot to start copying the outputs for that build.
2024-04-21 01:55:19 +02:00
026e3a3103
queue-runner: switch to pseudorandom ordering of builds processing
We don't rely on sequential / monotonic build IDs processing anymore, so
randomizing actually has the advantage of mixing builds for different
systems together, to avoid only one chunk of builds for a single system
getting processed while builders for other systems are starved.
2024-04-20 23:05:26 +02:00
6606a7f86e
queue runner: introduce some parallelism for remote paths lookup
Each output for a given step being ingested is looked up in parallel,
which should basically multiply the speed of builds ingestion by the
average number of outputs per derivation.
2024-04-20 22:28:18 +02:00
f31b95d371
queue-runner: reduce the time between queue monitor restarts
This will induce more DB queries (though these are fairly cheap), but at
the benefit of processing bumps within 1m instead of within 10m.
2024-04-20 16:58:10 +02:00
54f8daf6b1
queue-runner: remove id > X from new builds query
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
2024-04-20 16:53:52 +02:00
cc6bafe538
queue-runner: add prom metrics to allow detecting internal bottlenecks
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.

There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
2024-04-20 16:48:03 +02:00
6189ba9c5e
web: replace 'errormsg' with 'errormsg IS NULL' in most cases
This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
2024-04-12 20:14:09 +02:00
258e9314a9
web: include current step status on /machines 2024-04-11 17:15:58 +02:00
a51bd392a2
queue-runner: limit parallelism of CPU intensive operations
My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
2024-04-11 16:43:01 +02:00