Commit graph

4215 commits

Author SHA1 Message Date
raito d3257e4761 Merge pull request 'Add metric for builds waiting for download slot' (#9) from waiting-metrics into main
Reviewed-on: #9
Reviewed-by: raito <raito@noreply.git.lix.systems>
2024-10-01 16:16:24 +00:00
Ilya K f23ec71227 Add metric for builds waiting for download slot 2024-10-01 19:14:24 +03:00
leo60228 ac37e44982 Merge pull request 'Update Lix; fix build' (#7) from ma27/hydra:lix-update into main
Reviewed-on: #7
Reviewed-by: leo60228 <leo@60228.dev>
2024-09-02 22:56:03 +00:00
Maximilian Bosch 6a88e647e7
flake.lock: Update; fix build
Flake lock file updates:

• Updated input 'lix':
    'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=278fddc317cf0cf4d3602d0ec0f24d1dd281fadb' (2024-08-17)
  → 'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=02eb07cfd539c34c080cb1baf042e5e780c1fcc2' (2024-09-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/c3d4ac725177c030b1e289015989da2ad9d56af0' (2024-08-15)
  → 'github:NixOS/nixpkgs/6e99f2a27d600612004fbd2c3282d614bfee6421' (2024-08-30)
2024-09-02 10:53:46 +02:00
Pierre Bourdon 8d5d4942e1
queue-runner: remove unused method from State 2024-08-27 02:57:37 +02:00
Pierre Bourdon e5a8ee5c17
web: require permissions for /api/push 2024-08-27 02:57:16 +02:00
Pierre Bourdon fd7fd0ad65
treewide: clang-tidy modernize 2024-08-27 01:33:12 +02:00
Pierre Bourdon d3fcedbcf5
treewide: enable clang-tidy bugprone findings
Fix some trivial findings throughout the codebase, mostly making
implicit casts explicit.
2024-08-27 00:43:17 +02:00
Pierre Bourdon 3891ad77e3
queue-runner: change Machine object creation to work around clang bug
https://github.com/llvm/llvm-project/issues/106123
2024-08-26 22:34:48 +02:00
Pierre Bourdon 21fd1f8993
flake: add devShells, including a clang one for clang-tidy & more 2024-08-26 22:22:18 +02:00
emily ab6d81fad4
api: fix github webhook 2024-08-26 20:26:21 +02:00
Sandro 64df0cba47
Match URIs that don't end in .git
Co-authored-by: Charlotte <lotte@chir.rs>
2024-08-26 20:26:21 +02:00
Sandro Jäckel 6179b298cb
Add gitea push hook 2024-08-26 20:26:20 +02:00
Pierre Bourdon 44b9a7b95d
queue-runner: handle broken pg pool connections in builder code
Completes 9b62c52e5c with another location
that was initially missed.
2024-08-25 22:05:13 +02:00
Maximilian Bosch 3ee51dbe58 readIntoSocket: fix with store URIs containing an &
The third argument to `open()` in `-|` mode is passed to a shell if it's
a string. In my case the store URI contains
`?secret-key=${signingKey.directory}/secret&compression=zstd`

For the `nix store cat` case this means that

* until `&` the process will be started in the background. This fails
  immediately because no path to cat is specified.
* `compression=zstd` is a variable assignment
* the `$path` argument to `store cat` is attempted to be executed as
  another command

Passing just the list solves the problem.
2024-08-18 21:41:54 +00:00
Maximilian Bosch e987f74954
doc: drop dev-notes & make update-dbix more discoverable
`dev-notes` are severely outdated. I dropped everything except one note
that I moved to hacking.md. The parts about creating users are also
covered elsewhere.

The `update-dbix` part got a just command to make it discoverable again.
2024-08-18 14:47:09 +02:00
Maximilian Bosch 1f802c008c
flake.lock: Update
Flake lock file updates:

• Updated input 'lix':
    'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=5137cea99044d54337e439510a647743110b2d7d' (2024-08-10)
  → 'git+https://git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=278fddc317cf0cf4d3602d0ec0f24d1dd281fadb' (2024-08-17)
• Updated input 'nix-eval-jobs':
    'git+https://git.lix.systems/lix-project/nix-eval-jobs?ref=refs/heads/main&rev=c057494450f2d1420726ddb0bab145a5ff4ddfdd' (2024-07-17)
  → 'git+https://git.lix.systems/lix-project/nix-eval-jobs?ref=refs/heads/main&rev=42a160bce2fd9ffebc3809746bc80cc7208f9b08' (2024-08-13)
• Updated input 'nix-eval-jobs/flake-parts':
    'github:hercules-ci/flake-parts/9227223f6d922fee3c7b190b2cc238a99527bbb7' (2024-07-03)
  → 'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d' (2024-08-01)
• Updated input 'nix-eval-jobs/treefmt-nix':
    'github:numtide/treefmt-nix/0fb28f237f83295b4dd05e342f333b447c097398' (2024-07-15)
  → 'github:numtide/treefmt-nix/349de7bc435bdff37785c2466f054ed1766173be' (2024-08-12)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/a781ff33ae258bbcfd4ed6e673860c3e923bf2cc' (2024-08-10)
  → 'github:NixOS/nixpkgs/c3d4ac725177c030b1e289015989da2ad9d56af0' (2024-08-15)
2024-08-18 14:18:36 +02:00
Maximilian Bosch 3a4e0d4917
Get dev environment working again
* justfile, inspired from Lix.
* let foreman use the stuff from outputs, similar to what Lix does>
* mess around with PERL5LIB[1] and PATH to get tests running locally.

[1] I don't really know how `Setup` was found before tbh.
2024-08-18 14:18:36 +02:00
Maximilian Bosch 3517acc5ba
Add direnv & PLS to the dev setup 2024-08-18 14:18:35 +02:00
71rd 459aa0a598 Stream files from store instead of buffering them
When an artifact is requested from hydra the output is first copied
from the nix store into memory and then sent as a response, delaying
the download and taking up significant amounts of memory.

As reported in https://github.com/NixOS/hydra/issues/1357

Instead of calling a command and blocking while reading in the entire
output, this adds read_into_socket(). the function takes a
command, starting a subprocess with that command, returning a file
descriptor attached to stdout.
This file descriptor is then by responsebuilder of Catalyst to steam
the output directly
2024-08-13 22:09:48 +02:00
eldritch horrors f1b552ecbf update flake locks, fix compile errors 2024-08-12 22:45:34 +02:00
Pierre Bourdon db8c2cc4a8
.github: remove
The primary host for this repo isn't github, and this is causing a whole
bunch of false indications in the Lix Forgejo UI as it tries to run the
.github/workflow with non-existing runners.
2024-08-11 16:15:12 +02:00
Rick van Schijndel 8858abb1a6
t/test.pl: increase event-timeout, set qvf
Some checks failed
Test / tests (push) Has been cancelled
Only log issues/failures when something's actually up.
It has irked me for a long time that so much output came
out of running the tests, this seems to silence it.
It does hide some warnings, but I think it makes the output
so much more readable that it's worth the tradeoff.

Helps for highly parallel running of jobs, sometimes they'd not give output for a while.
Setting this timeout higher appears to help.
Not completely sure if this is the right place to do it, but it works fine for me.
2024-08-11 16:08:35 +02:00
Rick van Schijndel ef619eca99
t: increase timeouts for slow commands with high load
We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:

Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.

Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:

- hydra-init
- hydra-create-user
- nix-store --delete

This should address most issues with tests randomly failing.

Used the following script for endurance testing:

```

import os
import subprocess

run_counter = 0
fail_counter = 0

while True:
    try:
        run_counter += 1
        print(f"Starting run {run_counter}")
        env = os.environ
        env["YATH_JOB_COUNT"] = "20"
        result = subprocess.run(["perl", "t/test.pl"], env=env)
        if (result.returncode != 0):
            fail_counter += 1
        print(f"Finish run {run_counter}, total fail count: {fail_counter}")
    except KeyboardInterrupt:
        print(f"Finished {run_counter} runs with {fail_counter} fails")
        break
```

In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
2024-08-11 16:08:09 +02:00
marius david 41dfa0e443
Document the default user and port in hacking.md
Some checks are pending
Test / tests (push) Waiting to run
2024-08-11 16:06:08 +02:00
Pierre Bourdon 4b107e6ff3
hydra-eval-jobset: pass --workers and --max-memory-size to n-e-j
Some checks failed
Test / tests (push) Has been cancelled
Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).
2024-07-22 23:16:29 +02:00
Pierre Bourdon 4b886d9c45
autotools -> meson
Some checks are pending
Test / tests (push) Waiting to run
There are some known regressions regarding local testing setups - since
everything was kinda half written with the expectation that build dir =
source dir (which should not be true anymore). But everything builds and
the test suite runs fine, after several hours spent debugging random
crashes in libpqxx with MALLOC_PERTURB_...
2024-07-22 22:30:41 +02:00
Pierre Bourdon fbb894af4e
static: de-bundle vendored dependencies
The current way this whole build works is incompatible with having a
separate build dir, or at least with having a separate build dir. To be
improved in the future - maybe minimize the dependencies a bit. But this
isn't so much data that we really have to care.
2024-07-22 16:30:13 +02:00
Niklas Hambüchen 8a984efaef
renderInputDiff: Increase git hash length 8 -> 12
Some checks failed
Test / tests (push) Has been cancelled
See investigation on lengths required to be conflict-free in practice:

https://github.com/NixOS/hydra/pull/1258#issuecomment-1321891677
2024-07-21 12:23:29 +02:00
Pierre Bourdon abc9f11417
queue runner: fix store URI args being written to the SSH hosts file
Some checks are pending
Test / tests (push) Waiting to run
2024-07-20 16:09:07 +02:00
Pierre Bourdon 9a4a5dd624
jobset-eval: fix actions not showing up sometimes for new jobs
Some checks are pending
Test / tests (push) Waiting to run
New jobs have their "new" status take precedence over them being
"failed" or "queued", which means actions that can act on "failed" or
"queued" jobs weren't shown to the user when they could only act on
"new" jobs.
2024-07-20 13:09:39 +02:00
Janik Haag ac406a9175
nixos-modules: hydra-queue-runner fix network-online.target eval warning
Some checks failed
Test / tests (pull_request) Has been cancelled
Test / tests (push) Has been cancelled
2024-07-19 09:13:32 +02:00
Pierre Bourdon 73616aa0d9
nixos-module: don't force Nix GC to keep outputs
Some checks failed
Test / tests (push) Has been cancelled
This isn't actually needed (h.n.o even overrides it!).

Fix the use of deprecated `gc-keep-derivations` alias along the way.
2024-07-17 13:21:58 +02:00
Pierre Bourdon d33fc08341
nixos-module: fix trusted users
- Use extra-trusted-users to avoid overriding the default set of trusted
  users and causing permission issues.
- Add hydra and hydra-www users which also need permissions.
2024-07-17 13:20:37 +02:00
Pierre Bourdon b0e9b4b2f9
hydra-eval-jobset: incrementally ingest eval results
Some checks failed
Test / tests (push) Has been cancelled
Test / tests (pull_request) Has been cancelled
nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:

1. Use less RAM by avoiding buffering a whole eval's worth of metadata
   into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
   ingesting builds faster.
2024-07-17 12:05:41 +02:00
Pierre Bourdon 370a4bf138
treewide: start removing tests related to constituents
Some checks failed
Test / tests (push) Waiting to run
Test / tests (pull_request) Has been cancelled
The feature cannot easily be ported to nix-eval-jobs since it requires
deep integration into the evaluator, and h.n.o doesn't use it. Later
more of this will be ripped out.
2024-07-17 08:31:19 +02:00
Pierre Bourdon ed7c58708c
hydra-eval-jobs: remove, replaced by nix-eval-jobs 2024-07-17 08:31:19 +02:00
Pierre Bourdon 6d4ccff43c
hydra-eval-jobset: use nix-eval-jobs instead of hydra-eval-jobs 2024-07-17 08:31:19 +02:00
Pierre Bourdon 684cc50d86
flake: add nix-eval-jobs as input 2024-07-17 08:17:32 +02:00
Pierre Bourdon 6195cec6a3
hydra-queue-runner: adjust for Lix generators related changes 2024-07-16 04:35:44 +02:00
Pierre Bourdon 1fbfed8162
flake: rename 'nix' input to 'lix'
For consistency with other Lix forks of Nix ecosystems projects, e.g.
nix-eval-jobs.
2024-07-16 03:59:38 +02:00
Pierre Bourdon fb9e29d4d0
queue runner: fix nullptr deref on build exception after releasing a machine reservation
Some checks failed
Test / tests (push) Has been cancelled
2024-07-13 06:12:35 +02:00
Pierre Bourdon 05d620a54f
flake.lock: Update
Some checks are pending
Test / tests (push) Waiting to run
Flake lock file updates:

• Updated input 'nix':
    'git+https://git@git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=4c3d93611f2848c56ebc69c85f2b1e18001ed3c7' (2024-06-24)
  → 'git+https://git@git.lix.systems/lix-project/lix?ref=refs/heads/main&rev=4b109ec1a8fc4550150f56f0f46f2f41d844bda8' (2024-07-11)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/e4509b3a560c87a8d4cb6f9992b8915abf9e36d8' (2024-06-23)
  → 'github:NixOS/nixpkgs/a046c1202e11b62cbede5385ba64908feb7bfac4' (2024-07-11)
2024-07-13 03:08:50 +02:00
Pierre Bourdon a9a2679793
hydra-evaluator: fix regression from e9d0a3 (inverted assertion)
Some checks failed
Test / tests (push) Has been cancelled
2024-06-24 21:41:40 +02:00
Pierre Bourdon e9d0a3a754
Update to latest Lix main
Some checks are pending
Test / tests (push) Waiting to run
2024-06-24 20:25:35 +02:00
leo60228 cbe527a3ee
util.hh split
Some checks failed
Test / tests (push) Has been cancelled
2024-06-11 11:27:43 -04:00
leo60228 ca98f42b39
nixexpr -> lixexpr 2024-06-11 11:13:42 -04:00
John Ericson 62bc5b54b2
Try again to ensure hydra module is usable
Some checks are pending
Test / tests (push) Waiting to run
Nixpkgs only contains a `hydra_unstable`, not `hydra`, package, so
adjust the default accordingly, and then override it to our package in
the separate module which does that.

(cherry picked from commit e149da7b9bbc04bd0b1ca03fa0768e958cbcd40e)
2024-06-10 17:40:02 +02:00
John Ericson c98017b823
Factor out NixOS tests, and clean up
Due to newer nixpkgs, there were a number of things that could be
cleaned up in the process.

(cherry picked from commit 743795b2b090a5cdfe8bd90120add8db7770086a)
2024-06-10 17:40:02 +02:00
John Ericson ebae7a31fe
Remove PrometheusTiny from overlay
It's in Nixpkgs for a good while now.

(cherry picked from commit 92155f9a07f5fe32e0778e474e7313997811e635)
2024-06-10 17:40:02 +02:00