Commit graph

225 commits

Author SHA1 Message Date
raito 48579e8818 feat: add gdb to sysadmin tooling
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-08 22:10:06 +00:00
raito 8fe33b4e46 feat: add perf, pwru and various sysadmin tools to bagel-box
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-08 22:10:06 +00:00
Luke Granger-Brown d4e9dcc2a6 admins: provision lukegb
hello I can be trusted with your infrastructure
2024-07-08 21:55:41 +00:00
Pierre Bourdon 7f46e5d9a4
services: add ofborg, currently running rabbitmq only 2024-07-08 23:55:11 +02:00
raito 512cfdb43e fix: downgrade mina sshd due to broken PQC algorithm
https://cl.tvl.fyi/c/depot/+/11965

This breaks it with "ssh_dispatch_run_fatal: Connection to
2a01:4f8:242:5b21:0:feed:edef:beef port 29418: incorrect signature"

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-08 15:59:31 +02:00
raito 82395ec8ce Merge pull request 'pkgs/gerrit: update to 3.10.0' (#34) from upgrade-gerrit-differently into main
Reviewed-on: the-distro/bagel-infra#34
2024-07-08 12:21:21 +00:00
Ilya K 82e074881f DNS: clean up a bit, add root level record for future Matrix shenanigans 2024-07-08 13:54:15 +03:00
Ilya K b55475c12e Fix up the rest of the dashboards 2024-07-08 11:43:57 +03:00
Ilya K 9f0e601d84 Scrape grafana/loki/mimir own metrics 2024-07-08 10:25:15 +03:00
Ilya K 209f71c63a Update node_exporter dashboard for new metrics structure 2024-07-08 10:16:37 +03:00
Ilya K 563e0685d4 Metrics fixups
- fix grafana-agent config format
- rekey metrics-push-password for fodwatch
2024-07-08 10:01:25 +03:00
emily 8d2a367e92 grafana-agent: make bagel.monitoring.grafana-agent.exporters an attrset
This allows us to use multiple jobs, one for each additional exporter,
and set their `job_name` accordingly.

`job_name` is exported as `job` label on the resulting metrics.
This allows us to quickly get an understanding what metrics of an
exporter are actually available by simply filtering all metrics by
`{job="$jobname"}`
2024-07-08 09:34:26 +03:00
emily db8c831c2f grafana-agent: set hostname label on all metrics
This is handy to quickly see all metrics exported by a node, without
having to mangle with the already existing `instance` label.

`hostname` is essentially a variant of `instance` but without ports.
2024-07-08 09:34:26 +03:00
Ilya K ba0d50624d Switch to push metrics with Grafana Agent 2024-07-08 09:34:24 +03:00
Ilya K 40ba3c4ae7 Prepare for remote push metrics 2024-07-08 09:33:59 +03:00
Ilya K 346a74eabc Wire up Grafana to Alertmanager 2024-07-08 09:33:59 +03:00
Ilya K e8e262c6a4 Enable Mimir Alertmanager, add example alert
Still TODO: actually connect it to Matrix
2024-07-08 09:33:59 +03:00
Luke Granger-Brown dd6ee53bfe pkgs/gerrit: update to 3.10.0
This does a bit more than advertised, since this also switches to a
different set of Bazel package building infrastructure that I'm hoping
will be more extensible than buildBazelPackage as it exists in nixpkgs
today.

In any case, the FOD here _seems_ to be much more stable than that
previously produced by the old approach, but no promises :)
2024-07-08 02:44:05 +01:00
Pierre Bourdon 5ebd71e4d5
tf/hydra: change Hydra URL 2024-07-08 00:01:24 +02:00
Pierre Bourdon 2700ac5efc
tf/dns: fix hydra CNAME 2024-07-08 00:01:14 +02:00
Pierre Bourdon caa1fce74e
hydra: move to hydra.forkos.org 2024-07-07 23:53:21 +02:00
Pierre Bourdon 5f8228536c
bagel-box: switch to forkos.org DNS root 2024-07-07 23:52:40 +02:00
Pierre Bourdon 078f298b8c
tf/dns: add bagel-box and hydra 2024-07-07 23:48:23 +02:00
Pierre Bourdon 4b0a2cd7e5
tf: add DNS management via Gandi 2024-07-07 20:43:05 +02:00
Pierre Bourdon dcd5f68545
tf: store hydra credentials in state via numtide/secret 2024-07-07 19:18:30 +02:00
Pierre Bourdon 7c6780a2a3
gitignore: add terraform lock file 2024-07-07 19:18:30 +02:00
Pierre Bourdon dd72904bf1
flake: replace tf wrappers with a single '.#tf' command 2024-07-07 19:18:30 +02:00
Pierre Bourdon 2e9483936e
tf/hydra: fix project owner to use an automation account 2024-07-07 18:44:17 +02:00
Pierre Bourdon 30859b2872
terraform: store state on S3 2024-07-07 18:22:41 +02:00
Pierre Bourdon 0c68a23275
flake: fix 'nix flake check' 2024-07-07 18:02:55 +02:00
raito 8dc7ee9864
hydra: add declarative controls via terranix
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-07 17:59:56 +02:00
raito e803c198c1 admins: provision jade
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-07 13:15:27 +00:00
raito 578e24e634 systems: add fodwatch.forkos.org
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-07 13:15:27 +00:00
raito e1a034927c Merge pull request 'Split node_exporter and cadvisor config, disable cadvisor for nodes that are themselves containers' (#25) from cadvisor-containers into main
Reviewed-on: delroth/bagel-infra#25
Reviewed-by: raito <raito@noreply.git.lix.systems>
2024-07-05 17:21:27 +00:00
Ilya K 5b0f3c4541 Split node_exporter and cadvisor config, disable cadvisor for nodes that are themselves containers 2024-07-05 20:06:43 +03:00
raito b319b02f07 fix: remove custom logging format for Gerrit
This way, we get picked up by the LGTM stack exporter machinery.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 18:52:38 +02:00
raito 75f779716d Merge pull request 'Grafana' (#24) from grafana into main
Reviewed-on: delroth/bagel-infra#24
2024-07-05 16:43:13 +00:00
Ilya K 2441d18f17 Add Loki + Promtail setup 2024-07-05 16:10:31 +00:00
Ilya K 03cb9c390c Add postgres exporter 2024-07-05 16:10:31 +00:00
Ilya K 42f8ad8fa4 Add nginx log exporter 2024-07-05 16:10:31 +00:00
Ilya K 63b31e98cf Add Grafana/Prometheus/Mimir minimal setup
More later, Loki also later.
2024-07-05 16:10:31 +00:00
Ilya K 99f715caca Add devShell with agenix and colmena 2024-07-05 16:10:31 +00:00
Ilya K 3ad481c125 Clean up SSH key dupes, add Maxine 2024-07-05 16:10:31 +00:00
Pierre Bourdon 34a29552da
hydra: update the epyc.infra.newtype.fr public host key 2024-07-05 16:43:29 +02:00
raito fa1bc1ced9 Merge pull request 'gerrit01: those who finetune even further' (#20) from gerrit-finetuning into main
Reviewed-on: delroth/bagel-infra#20
2024-07-05 12:37:43 +00:00
raito 6b7ddbcd29 bagel-box: reuse common/ module
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 13:29:56 +02:00
raito e27f152f00 common/base-server: use ambiant stable lix by default
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 13:29:47 +02:00
raito 6fb584109a common/raito-vm: disable useDHCP
We are using networkd by default…

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 13:12:35 +02:00
raito 0b01e9a99f gerrit01: those who finetune even further
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-05 12:23:44 +02:00
raito 832b0784d8 common/admins: add K900
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-07-04 23:57:05 +02:00