54e8282aac
fix: use promtool to verify rules, fix format
2024-11-12 23:21:30 +03:00
02f8bc7ca4
chore(o11y): filter by tenancy on node_exporter
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-22 16:57:37 +02:00
b1f4674da0
chore: add tenancy in postgres
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-22 16:57:06 +02:00
226eacdeec
chore: add tenancy in node_exporter
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-22 16:57:06 +02:00
bee402fecc
fix: ensure that pg_stat_statements is always created as an ext
...
Otherwise, we will have issues with this exporter.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-21 14:33:18 +02:00
84cfbdb050
feat: check formatting and validity of alerts
...
Fixes #94 .
2024-10-07 20:00:54 +00:00
de085155a6
fix: update paths to floral secrets to secrets/floral/
2024-10-07 15:48:05 +00:00
1e421889e4
feat(monitoring): add static label for tenancy
...
So we can distinguish easily things in the dashboards.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-06 11:10:16 +02:00
92560708b8
feat: multi-tenant secrets
...
Lix may have its own secrets and we want to maintain a certain
generalization level on the NixOS modules, so we can decorrelate which
secret we select dynamically by having a simple tenancy hierarchy
system.
This unfortunately requires to rewrite all call sites with a floral
prefix until we migrate them to the simple internal secret module which
is aware of this.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-10-06 08:10:44 +00:00
5582a0a29b
Fix Hydra exporter crash loop nonsense
2024-10-01 19:27:13 +03:00
4ddf87fa8e
Add new metric to Hydra exporter
2024-10-01 19:27:05 +03:00
e2c6550796
Hydra metrics
...
Yoink the nixos org exporter, rewrite most of it, deploy
2024-10-01 19:06:26 +03:00
c1712dc1fa
Set up tempo
2024-08-31 15:05:30 +03:00
024b431cbc
feat(grafana): plug jsonnet-based dashboards in provisioning
...
Add the gerrit dashboards as an example.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-24 16:32:21 +02:00
aef541829e
Fix pyroscope datasource
2024-08-24 11:39:25 +03:00
1fc15526d7
fix(pyroscope): add the gRPC endpoint as proxy as well
...
This is not documented but necessary for Alloy to operate.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-24 10:33:49 +02:00
2544adba8e
fix(gerrit): setup Alloy & Pyroscope more according to the docs
...
Still not working due to "unimplemented: error 404 not found" at push
time, but it's really unclear now why this occur.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-24 08:45:20 +02:00
702867cd62
feat(pyroscope): add push API & reverse proxy
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 21:04:22 +02:00
7cde6e92ae
feat(grafana): add Pyroscope datasource
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 21:04:11 +02:00
ac7815321a
feat(pyroscope): add secrets and storage
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 20:58:08 +02:00
db46b01ae9
feat(monitoring): add pyroscope to the infrastructure
...
Vendored for the time being.
See https://cl.forkos.org/c/nixpkgs/+/181 for upstreaming properly.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 20:43:00 +02:00
c380f29937
fix(grafana): remove the global pgsql module dependency for now
...
We should re-introduce it once things are a bit scoped out.
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-23 20:43:00 +02:00
84efd0976d
feat(alerts): add a sync failed too often alert
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:25:34 +02:00
e2f5a7b0e4
feat(alerts): add basic postgresql alerts
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:06:34 +02:00
7388de79c4
feat(alerts): add some basic "host & hardware" alerts
...
Signed-off-by: Raito Bezarius <masterancpp@gmail.com>
2024-08-09 16:06:34 +02:00
f8cad42b5c
Set up alertmanager-hookshot-adapter
2024-08-09 14:03:56 +00:00
bebc7f2586
We have nothing to hide
2024-07-23 18:09:49 +03:00
766dc4c383
Mimir also wants network-online.target
...
Thank you helpful eval warning
2024-07-19 12:03:55 +03:00
65b07a936b
Make sure Mimir starts after network is up
2024-07-19 12:00:52 +03:00
e3e60a5e72
services/monitoring: add scraping of Gerrit's internal metrics
2024-07-15 11:02:54 +00:00
7a937e837a
Unlimit Mimir max series
2024-07-13 15:52:46 +03:00
e84b362b7a
Allow 12 hour of backfill for metrics
...
This is somewhat experimental and may explode, but we'll see, I guess
2024-07-10 14:59:09 +03:00
9e7e6d42ab
Make nginx/loki/mimir go fast
2024-07-10 14:55:28 +03:00
b55475c12e
Fix up the rest of the dashboards
2024-07-08 11:43:57 +03:00
9f0e601d84
Scrape grafana/loki/mimir own metrics
2024-07-08 10:25:15 +03:00
209f71c63a
Update node_exporter dashboard for new metrics structure
2024-07-08 10:16:37 +03:00
563e0685d4
Metrics fixups
...
- fix grafana-agent config format
- rekey metrics-push-password for fodwatch
2024-07-08 10:01:25 +03:00
8d2a367e92
grafana-agent: make bagel.monitoring.grafana-agent.exporters
an attrset
...
This allows us to use multiple jobs, one for each additional exporter,
and set their `job_name` accordingly.
`job_name` is exported as `job` label on the resulting metrics.
This allows us to quickly get an understanding what metrics of an
exporter are actually available by simply filtering all metrics by
`{job="$jobname"}`
2024-07-08 09:34:26 +03:00
db8c831c2f
grafana-agent: set hostname
label on all metrics
...
This is handy to quickly see all metrics exported by a node, without
having to mangle with the already existing `instance` label.
`hostname` is essentially a variant of `instance` but without ports.
2024-07-08 09:34:26 +03:00
ba0d50624d
Switch to push metrics with Grafana Agent
2024-07-08 09:34:24 +03:00
40ba3c4ae7
Prepare for remote push metrics
2024-07-08 09:33:59 +03:00
346a74eabc
Wire up Grafana to Alertmanager
2024-07-08 09:33:59 +03:00
e8e262c6a4
Enable Mimir Alertmanager, add example alert
...
Still TODO: actually connect it to Matrix
2024-07-08 09:33:59 +03:00
5b0f3c4541
Split node_exporter and cadvisor config, disable cadvisor for nodes that are themselves containers
2024-07-05 20:06:43 +03:00
2441d18f17
Add Loki + Promtail setup
2024-07-05 16:10:31 +00:00
03cb9c390c
Add postgres exporter
2024-07-05 16:10:31 +00:00
42f8ad8fa4
Add nginx log exporter
2024-07-05 16:10:31 +00:00
63b31e98cf
Add Grafana/Prometheus/Mimir minimal setup
...
More later, Loki also later.
2024-07-05 16:10:31 +00:00