Pierre Bourdon
90325344a3
Reserve builder-11 for build coordination, rename to build-coord
2024-08-13 19:12:36 +02:00
Pierre Bourdon
17c342b33e
Partial revert "Add Grapevine Matrix server and matrix-hookshot"
...
This partially reverts commit d2f3ca5624
.
Said commit requires IFD to eval, which is generally unwanted, and is
currently forbidden on Hydra (imo: rightfully so, we should try to
properly separate evals from builds).
The services/ file for grapevine is kept but will not work without the
flake.nix change reapplied.
2024-08-13 00:35:10 +02:00
raito
84efd0976d
feat(alerts): add a sync failed too often alert
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:25:34 +02:00
raito
e2f5a7b0e4
feat(alerts): add basic postgresql alerts
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:06:34 +02:00
raito
7388de79c4
feat(alerts): add some basic "host & hardware" alerts
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:06:34 +02:00
Ilya K
f8cad42b5c
Set up alertmanager-hookshot-adapter
2024-08-09 14:03:56 +00:00
Ilya K
9ad279a505
Set up admins + DNS for hookshot
2024-08-09 14:03:56 +00:00
Ilya K
d2f3ca5624
Add Grapevine Matrix server and matrix-hookshot
...
It doesn't want to work.
2024-08-09 14:03:56 +00:00
Yureka
b6375b8294
add staging sync services
2024-08-08 15:16:04 +02:00
Yureka
420e6915df
Vous avez des branches divergentes et vous devez spécifier comment les réconcilier
2024-08-08 10:39:00 +02:00
Yureka
dbb4e03292
Revert "builders: direct buildbot to /mnt store via ForceCommand"
...
This reverts commit dfd48f2179
.
2024-08-08 10:37:42 +02:00
Yureka
cd0621ba55
builders/netboot: add separate firmware_part output
2024-08-06 13:26:51 +02:00
Yureka
dfd48f2179
builders: direct buildbot to /mnt store via ForceCommand
2024-08-06 13:26:35 +02:00
Yureka
77ff556583
builders: fix provisioning of ssh hostkeys
2024-08-05 08:18:20 +02:00
Yureka
fe3cb577c1
fix eval
2024-08-05 07:20:59 +02:00
Yureka
20fc4c8f96
builders: move provisioning of ssh hostkeys to a systemd service
...
at first activation it does not yet have a working network setup
2024-08-05 07:17:45 +02:00
Yureka
bce44930b1
builders: provision ssh hostkeys on boot
2024-08-04 18:12:02 +02:00
Yureka
79dea0686b
add 'notipxe' netboot loader based on systemd-initrd + u-root
2024-08-03 20:28:57 +02:00
Yureka
aeb8102ae4
builders: do not mount / and /boot on netboot systems
2024-08-03 20:01:39 +02:00
Yureka
830dcbf6bc
builders: do not mount / and /boot on netboot systems
2024-08-03 18:41:01 +02:00
Yureka
93822775a9
baremetal-builders: do not create swapfile on rootfs when netbooting
2024-08-03 18:10:59 +02:00
Yureka
dd028656ac
builders: fix serial console
2024-08-02 13:21:04 +02:00
Yureka
88317d099c
attempt to fix netboot hydra jobs
2024-08-02 01:05:20 +02:00
Yureka
1cbf286f18
build netboot files from hydra
2024-08-01 22:47:25 +02:00
Yureka
6dc424dd43
wob01: serve an ipxe over iusb-spoof
2024-08-01 22:16:48 +02:00
Yureka
504a443acc
adjust hydra-gc numbers
...
we want to see how garbage collection would behave on a 480GB drive
2024-07-31 23:44:08 +02:00
emily
96d58bbd41
forgejo: disable users explore page
...
This was requested and should make it a decent bit more difficult to get
a somewhat complete list of users on this instance.
We are, however, aware of other endpoints that can be used to get to a
similar result. Those just aren't as convenient nor obvious.
https://forgejo.org/docs/latest/admin/config-cheat-sheet/#service---explore-serviceexplore
2024-07-31 01:42:05 +02:00
Yureka
5154906aac
fix eval in assignments.nix
2024-07-30 17:23:54 +02:00
Yureka
f3828368e6
hydra: set reasonable max-jobs and cores
2024-07-30 17:03:12 +02:00
Yureka
4e2d21930f
baremetal-builders: detect percent_filled for the correct partition
2024-07-30 13:59:46 +02:00
Yureka
99259356f2
make buildbot-signing-key accessible to buildbot-worker
2024-07-28 23:30:38 +02:00
Yureka
5474832b07
baremetal builders: filesystem optimizations
2024-07-28 19:20:23 +02:00
Yureka
15a684c5d7
baremetal-builders: more 'intelligent' gc
2024-07-26 12:17:27 +02:00
Yureka
74e06ac6d0
hydra gc every 20h
...
metrics analysis has showed that this is unlikely to fill up the builders
2024-07-24 09:35:18 +02:00
raito
e5a3ce2283
buildbot fixes ( #76 )
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
Signed-off-by: Yureka <yureka@forkos.org>
Co-authored-by: raito <raito@noreply.git.lix.systems>
Co-committed-by: raito <raito@noreply.git.lix.systems>
2024-07-24 06:44:25 +00:00
Ilya K
bebc7f2586
We have nothing to hide
2024-07-23 18:09:49 +03:00
Pierre Bourdon
608c0e5973
hydra: bump to 16 evaluation workers, we have enough RAM and cores to afford it
2024-07-22 23:13:33 +02:00
raito
62ccc0282b
fix(ows): per-job runtime directories + proper local refspec
...
The local refspec was weird and exploiting a edge case for the nixpkgs
jobs where local and from were the same.
We are more explicit now, which fixes the sandbox jobs.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-22 15:41:47 +02:00
Yureka
d84a43b781
builders: run gc 3x per day
...
We can still adjust it if the disks fill up, but currently it is too frequent
2024-07-21 19:49:21 +02:00
Yureka
2dc5899660
baremetal: run hydra store gc as builder user
2024-07-20 17:00:39 +02:00
Yureka
adaf4b0aef
baremetal: tmp on the same filesystem as hydra store
2024-07-20 17:00:39 +02:00
Yureka
5bde7e2358
use dedicated store partition for hydra builds
2024-07-20 15:14:00 +02:00
Yureka
d9809e1e78
gerrit-one-way-sync: disallow auto-merging a staging iteration into master
2024-07-20 15:14:00 +02:00
Yureka
3fa4a25d87
gerrit-one-way-sync: set git user info
2024-07-20 15:14:00 +02:00
Yureka
0ff5eea4ed
gerrit-one-way-sync: merge instead of rebase
2024-07-20 15:14:00 +02:00
raito
80c4757571
gerrit01: add a one-way-sync service
...
It's basic and does not handle conflicts which needs to be manually
managed.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-19 17:52:44 +02:00
Ilya K
d1e64b6610
Fix eval warning here too
2024-07-19 12:06:03 +03:00
Ilya K
766dc4c383
Mimir also wants network-online.target
...
Thank you helpful eval warning
2024-07-19 12:03:55 +03:00
Ilya K
65b07a936b
Make sure Mimir starts after network is up
2024-07-19 12:00:52 +03:00
raito
8afcf249d6
buildbot: upgrade to local machine specifications
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-18 12:18:02 +02:00
raito
4473717e9f
gerrit: introduce buildbot checks plugin
...
It's a modified version of @puck 's Lix buildbot checks for
gerrit.lix.systems with a slight generalization in the configuration for
many repositories.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-18 10:56:46 +02:00
raito
da7175303c
buildbot: add support for remote builders via baremetal machines
...
For now, only builder-3 is used.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-17 18:28:26 +02:00
raito
7789e9ce75
services/buildbot: init
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-17 18:00:51 +02:00
raito
fda59ee6c0
gerrit: factor more configuration in the NixOS module for external consumption
...
Other modules may require information to configure themselves from the
Gerrit module.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-17 15:43:35 +02:00
emily
95b58de737
forgejo: use redis as cache and session provider
2024-07-16 20:09:15 +02:00
emily
8b9d33d70c
forgejo: disable registrations, enable auto-registration for SSO
2024-07-16 17:14:23 +02:00
emily
dd069c40d7
forgejo: init service
2024-07-16 15:44:06 +02:00
Luke Granger-Brown
e3e60a5e72
services/monitoring: add scraping of Gerrit's internal metrics
2024-07-15 11:02:54 +00:00
Luke Granger-Brown
2e86babc8a
services/gerrit: add metrics-prometheus-exporter
2024-07-15 11:02:54 +00:00
Ilya K
7a937e837a
Unlimit Mimir max series
2024-07-13 15:52:46 +03:00
Pierre Bourdon
7d9461808c
builders: configure a swapfile + zswap
2024-07-13 04:40:51 +02:00
Pierre Bourdon
293bc52ace
hydra: reduce number of parallel builds per builder to limit RAM consumption
2024-07-13 04:38:24 +02:00
Pierre Bourdon
756341ea4c
builders: tune sshd MaxStartups to avoid rate limiting Hydra
2024-07-12 21:57:04 +02:00
Yureka
e6ead602f0
builders get a special treatment for dns64
2024-07-11 02:05:58 +02:00
Yureka
b14f155d55
add ipmitool on vpn-gw and builders
2024-07-10 20:49:17 +02:00
Pierre Bourdon
d2336262fb
hydra: set allowed URIs in restricted mode for flake inputs
2024-07-10 18:52:22 +02:00
Pierre Bourdon
411d514ab9
hydra: user hydra-www needs nix-daemon access too
2024-07-10 17:36:39 +02:00
Pierre Bourdon
f74d1ca0f6
hydra: start signing paths
2024-07-10 17:34:57 +02:00
Ilya K
e84b362b7a
Allow 12 hour of backfill for metrics
...
This is somewhat experimental and may explode, but we'll see, I guess
2024-07-10 14:59:09 +03:00
Ilya K
9e7e6d42ab
Make nginx/loki/mimir go fast
2024-07-10 14:55:28 +03:00
Pierre Bourdon
f2c2bc5ab6
hydra: output machine host key as base64 in the generated machines.conf
2024-07-10 02:16:45 +02:00
Pierre Bourdon
f214da9228
hydra: add hydra to nix trusted-users
2024-07-10 02:03:33 +02:00
Luke Granger-Brown
82db8f7f1e
gerrit01: some more tuning
...
* flip off proxy_buffering again
* enable REVWALK_USE_PRIORITY_QUEUE
* enable delta compression, because that's not a bottleneck and it's
nicer on bandwidth
2024-07-10 00:27:36 +01:00
raito
9988811be5
hydra: unplug the EPYC
...
thank you for your testing services
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-10 01:13:10 +02:00
raito
2308870aa5
builders: add a nice tag to deploy all of them at once
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-10 00:59:31 +02:00
raito
645ad7d062
builders: add builder user
...
currently hardcoded to hydra's coordinator public key
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-10 00:55:25 +02:00
raito
a30c1f7d78
hydra: wire up new builders
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-10 00:45:02 +02:00
Yureka
eb21cb6916
add baremetal builders
2024-07-10 00:35:01 +02:00
raito
3828721e4f
services/netbox: enable OIDC via Lix SSO
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-09 02:45:58 +02:00
Luke Granger-Brown
8a9ff8c40d
services/gerrit: migrate to Gerrit from the-distro/nix-gerrit flake
2024-07-08 23:30:59 +01:00
Pierre Bourdon
7f46e5d9a4
services: add ofborg, currently running rabbitmq only
2024-07-08 23:55:11 +02:00
Ilya K
b55475c12e
Fix up the rest of the dashboards
2024-07-08 11:43:57 +03:00
Ilya K
9f0e601d84
Scrape grafana/loki/mimir own metrics
2024-07-08 10:25:15 +03:00
Ilya K
209f71c63a
Update node_exporter dashboard for new metrics structure
2024-07-08 10:16:37 +03:00
Ilya K
563e0685d4
Metrics fixups
...
- fix grafana-agent config format
- rekey metrics-push-password for fodwatch
2024-07-08 10:01:25 +03:00
emily
8d2a367e92
grafana-agent: make bagel.monitoring.grafana-agent.exporters
an attrset
...
This allows us to use multiple jobs, one for each additional exporter,
and set their `job_name` accordingly.
`job_name` is exported as `job` label on the resulting metrics.
This allows us to quickly get an understanding what metrics of an
exporter are actually available by simply filtering all metrics by
`{job="$jobname"}`
2024-07-08 09:34:26 +03:00
emily
db8c831c2f
grafana-agent: set hostname
label on all metrics
...
This is handy to quickly see all metrics exported by a node, without
having to mangle with the already existing `instance` label.
`hostname` is essentially a variant of `instance` but without ports.
2024-07-08 09:34:26 +03:00
Ilya K
ba0d50624d
Switch to push metrics with Grafana Agent
2024-07-08 09:34:24 +03:00
Ilya K
40ba3c4ae7
Prepare for remote push metrics
2024-07-08 09:33:59 +03:00
Ilya K
346a74eabc
Wire up Grafana to Alertmanager
2024-07-08 09:33:59 +03:00
Ilya K
e8e262c6a4
Enable Mimir Alertmanager, add example alert
...
Still TODO: actually connect it to Matrix
2024-07-08 09:33:59 +03:00
Pierre Bourdon
caa1fce74e
hydra: move to hydra.forkos.org
2024-07-07 23:53:21 +02:00
Ilya K
5b0f3c4541
Split node_exporter and cadvisor config, disable cadvisor for nodes that are themselves containers
2024-07-05 20:06:43 +03:00
raito
b319b02f07
fix: remove custom logging format for Gerrit
...
This way, we get picked up by the LGTM stack exporter machinery.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 18:52:38 +02:00
Ilya K
2441d18f17
Add Loki + Promtail setup
2024-07-05 16:10:31 +00:00
Ilya K
03cb9c390c
Add postgres exporter
2024-07-05 16:10:31 +00:00
Ilya K
42f8ad8fa4
Add nginx log exporter
2024-07-05 16:10:31 +00:00
Ilya K
63b31e98cf
Add Grafana/Prometheus/Mimir minimal setup
...
More later, Loki also later.
2024-07-05 16:10:31 +00:00
Pierre Bourdon
34a29552da
hydra: update the epyc.infra.newtype.fr public host key
2024-07-05 16:43:29 +02:00
raito
0b01e9a99f
gerrit01: those who finetune even further
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 12:23:44 +02:00