Commit graph

40 commits

Author SHA1 Message Date
raito 92560708b8 feat: multi-tenant secrets
Lix may have its own secrets and we want to maintain a certain
generalization level on the NixOS modules, so we can decorrelate which
secret we select dynamically by having a simple tenancy hierarchy
system.

This unfortunately requires to rewrite all call sites with a floral
prefix until we migrate them to the simple internal secret module which
is aware of this.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-06 08:10:44 +00:00
Ilya K 5582a0a29b Fix Hydra exporter crash loop nonsense 2024-10-01 19:27:13 +03:00
Ilya K 4ddf87fa8e Add new metric to Hydra exporter 2024-10-01 19:27:05 +03:00
Ilya K e2c6550796 Hydra metrics
Yoink the nixos org exporter, rewrite most of it, deploy
2024-10-01 19:06:26 +03:00
Ilya K c1712dc1fa Set up tempo 2024-08-31 15:05:30 +03:00
raito 024b431cbc feat(grafana): plug jsonnet-based dashboards in provisioning
Add the gerrit dashboards as an example.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-24 16:32:21 +02:00
Ilya K aef541829e Fix pyroscope datasource 2024-08-24 11:39:25 +03:00
raito 1fc15526d7 fix(pyroscope): add the gRPC endpoint as proxy as well
This is not documented but necessary for Alloy to operate.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-24 10:33:49 +02:00
raito 2544adba8e fix(gerrit): setup Alloy & Pyroscope more according to the docs
Still not working due to "unimplemented: error 404 not found" at push
time, but it's really unclear now why this occur.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-24 08:45:20 +02:00
raito 702867cd62 feat(pyroscope): add push API & reverse proxy
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 21:04:22 +02:00
raito 7cde6e92ae feat(grafana): add Pyroscope datasource
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 21:04:11 +02:00
raito ac7815321a feat(pyroscope): add secrets and storage
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 20:58:08 +02:00
raito db46b01ae9 feat(monitoring): add pyroscope to the infrastructure
Vendored for the time being.
See https://cl.forkos.org/c/nixpkgs/+/181 for upstreaming properly.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 20:43:00 +02:00
raito c380f29937 fix(grafana): remove the global pgsql module dependency for now
We should re-introduce it once things are a bit scoped out.

Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 20:43:00 +02:00
raito 84efd0976d feat(alerts): add a sync failed too often alert
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:25:34 +02:00
raito e2f5a7b0e4 feat(alerts): add basic postgresql alerts
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:06:34 +02:00
raito 7388de79c4 feat(alerts): add some basic "host & hardware" alerts
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:06:34 +02:00
Ilya K f8cad42b5c Set up alertmanager-hookshot-adapter 2024-08-09 14:03:56 +00:00
Ilya K bebc7f2586 We have nothing to hide 2024-07-23 18:09:49 +03:00
Ilya K 766dc4c383 Mimir also wants network-online.target
Thank you helpful eval warning
2024-07-19 12:03:55 +03:00
Ilya K 65b07a936b Make sure Mimir starts after network is up 2024-07-19 12:00:52 +03:00
Luke Granger-Brown e3e60a5e72 services/monitoring: add scraping of Gerrit's internal metrics 2024-07-15 11:02:54 +00:00
Ilya K 7a937e837a Unlimit Mimir max series 2024-07-13 15:52:46 +03:00
Ilya K e84b362b7a Allow 12 hour of backfill for metrics
This is somewhat experimental and may explode, but we'll see, I guess
2024-07-10 14:59:09 +03:00
Ilya K 9e7e6d42ab Make nginx/loki/mimir go fast 2024-07-10 14:55:28 +03:00
Ilya K b55475c12e Fix up the rest of the dashboards 2024-07-08 11:43:57 +03:00
Ilya K 9f0e601d84 Scrape grafana/loki/mimir own metrics 2024-07-08 10:25:15 +03:00
Ilya K 209f71c63a Update node_exporter dashboard for new metrics structure 2024-07-08 10:16:37 +03:00
Ilya K 563e0685d4 Metrics fixups
- fix grafana-agent config format
- rekey metrics-push-password for fodwatch
2024-07-08 10:01:25 +03:00
emily 8d2a367e92 grafana-agent: make bagel.monitoring.grafana-agent.exporters an attrset
This allows us to use multiple jobs, one for each additional exporter,
and set their `job_name` accordingly.

`job_name` is exported as `job` label on the resulting metrics.
This allows us to quickly get an understanding what metrics of an
exporter are actually available by simply filtering all metrics by
`{job="$jobname"}`
2024-07-08 09:34:26 +03:00
emily db8c831c2f grafana-agent: set hostname label on all metrics
This is handy to quickly see all metrics exported by a node, without
having to mangle with the already existing `instance` label.

`hostname` is essentially a variant of `instance` but without ports.
2024-07-08 09:34:26 +03:00
Ilya K ba0d50624d Switch to push metrics with Grafana Agent 2024-07-08 09:34:24 +03:00
Ilya K 40ba3c4ae7 Prepare for remote push metrics 2024-07-08 09:33:59 +03:00
Ilya K 346a74eabc Wire up Grafana to Alertmanager 2024-07-08 09:33:59 +03:00
Ilya K e8e262c6a4 Enable Mimir Alertmanager, add example alert
Still TODO: actually connect it to Matrix
2024-07-08 09:33:59 +03:00
Ilya K 5b0f3c4541 Split node_exporter and cadvisor config, disable cadvisor for nodes that are themselves containers 2024-07-05 20:06:43 +03:00
Ilya K 2441d18f17 Add Loki + Promtail setup 2024-07-05 16:10:31 +00:00
Ilya K 03cb9c390c Add postgres exporter 2024-07-05 16:10:31 +00:00
Ilya K 42f8ad8fa4 Add nginx log exporter 2024-07-05 16:10:31 +00:00
Ilya K 63b31e98cf Add Grafana/Prometheus/Mimir minimal setup
More later, Loki also later.
2024-07-05 16:10:31 +00:00