Enable Mimir Alertmanager, add example alert #33

Merged
k900 merged 6 commits from alertmanager into main 2024-07-08 06:35:33 +00:00
Owner

Still TODO: actually connect it to Matrix

Still TODO: actually connect it to Matrix
k900 added 1 commit 2024-07-07 14:06:32 +00:00
Still TODO: actually connect it to Matrix
k900 added 1 commit 2024-07-07 14:13:34 +00:00
k900 added 2 commits 2024-07-07 15:23:56 +00:00
k900 force-pushed alertmanager from ddd0c365ae to d26dd9f2a6 2024-07-07 16:33:08 +00:00 Compare
emilylange added 1 commit 2024-07-07 18:59:29 +00:00
The node_exporter integration in grafana-agent enables a lot of
collectors by default.
We don't want those defaults. We want to opt-into each manually.

See https://grafana.com/docs/agent/latest/static/configuration/integrations/node-exporter-config/
for the difference between enable_collectors and set_collectors, and a
list (table) of collectors that would otherwise be enabled by default.
emilylange reviewed 2024-07-07 20:07:01 +00:00
@ -0,0 +46,4 @@
name = config.networking.hostName;
scrape_configs = [
{
job_name = config.networking.hostName;
Member

job_name is exposed as job in the resulting metrics.

I get that the previous version -- the pull based metrics collection using prometheus -- had one job per machine as well.

And I might be missing something. But I do feel like we can do with multiple jobs here. One job for each exporter, with the job_name set to the exporter name.

This allows one to use the metrics browser in Grafana to list all metrics of /one/ exporter by simply

  1. selecting "job" in "2. Select label to search in"
  2. and then in "3. Select (multiple) values for your labels" your job name (exporter name)

I find that helpful when wondering which metrics are actually exported and available.

And if you want all metrics of a single instance, you can throw together a simple regex with the instance label.
Regex, because instance contains both the hostname and a port.

Or add another static label, e.g. hostname, similar to what you do down below in the logs section with host.

Let me know what you think. I can do the implementation if you want.
config.bagel.monitoring.grafana-agent.exporters would need to become an attrset and all.

And the hostname thingy is as simple as

diff --git a/services/monitoring/agent.nix b/services/monitoring/agent.nix
index e538cb7..e52ea98 100644
--- a/services/monitoring/agent.nix
+++ b/services/monitoring/agent.nix
@@ -48,7 +48,10 @@ in
                 {
                   job_name = config.networking.hostName;
                   static_configs = [
-                    { targets = map (e: "localhost:" + (toString e.port)) config.bagel.monitoring.grafana-agent.exporters; }
+                    {
+                      targets = map (e: "localhost:" + (toString e.port)) config.bagel.monitoring.grafana-agent.exporters;
+                      labels.hostname = config.networking.hostName;;
+                    }
                   ];
                 }
               ];
`job_name` is exposed as `job` in the resulting metrics. I get that the previous version -- the pull based metrics collection using prometheus -- had one job per machine as well. And I might be missing something. But I do feel like we can do with multiple jobs here. One job for each exporter, with the `job_name` set to the exporter name. This allows one to use the metrics browser in Grafana to list all metrics of /one/ exporter by simply 1. selecting "job" in "2. Select label to search in" 1. and then in "3. Select (multiple) values for your labels" your job name (exporter name) I find that helpful when wondering which metrics are actually exported and available. And if you want all metrics of a single instance, you can throw together a simple regex with the `instance` label. Regex, because `instance` contains both the hostname and a port. Or add another static label, e.g. `hostname`, similar to what you do down below in the logs section with `host`. Let me know what you think. I can do the implementation if you want. `config.bagel.monitoring.grafana-agent.exporters` would need to become an attrset and all. And the `hostname` thingy is as simple as ```diff diff --git a/services/monitoring/agent.nix b/services/monitoring/agent.nix index e538cb7..e52ea98 100644 --- a/services/monitoring/agent.nix +++ b/services/monitoring/agent.nix @@ -48,7 +48,10 @@ in { job_name = config.networking.hostName; static_configs = [ - { targets = map (e: "localhost:" + (toString e.port)) config.bagel.monitoring.grafana-agent.exporters; } + { + targets = map (e: "localhost:" + (toString e.port)) config.bagel.monitoring.grafana-agent.exporters; + labels.hostname = config.networking.hostName;; + } ]; } ]; ```
Author
Owner

Yeah, I can do that. I've never really bothered beyond the one job per machine thing but this can also work and if it makes life easier for people, I don't care either way.

Yeah, I can do that. I've never really bothered beyond the one job per machine thing but this can also work and if it makes life easier for people, I don't care either way.
k900 reviewed 2024-07-07 21:17:12 +00:00
@ -0,0 +94,4 @@
# We want to be explicit about the collectors we enable, so we use
# set_collectors instead of enable_collectors.
# https://grafana.com/docs/agent/latest/static/configuration/integrations/node-exporter-config/
integrations.node_exporter.set_collectors = [
Author
Owner

I actually did this intentionally, but I can add all the exporters manually too.

I actually did this intentionally, but I can add all the exporters manually too.
Member

Whoops I misread the previous services.prometheus.exporters.node.enabledCollectors that this is meant to replace.
Sorry for that. Will drop my commit. One sec.

Whoops I misread the previous `services.prometheus.exporters.node.enabledCollectors` that this is meant to replace. Sorry for that. Will drop my commit. One sec.
emilylange marked this conversation as resolved
emilylange force-pushed alertmanager from 11975c855d to d26dd9f2a6 2024-07-07 21:45:28 +00:00 Compare
emilylange added 2 commits 2024-07-07 22:37:57 +00:00
This is handy to quickly see all metrics exported by a node, without
having to mangle with the already existing `instance` label.

`hostname` is essentially a variant of `instance` but without ports.
This allows us to use multiple jobs, one for each additional exporter,
and set their `job_name` accordingly.

`job_name` is exported as `job` label on the resulting metrics.
This allows us to quickly get an understanding what metrics of an
exporter are actually available by simply filtering all metrics by
`{job="$jobname"}`
k900 force-pushed alertmanager from 1693f644ee to 8d2a367e92 2024-07-08 06:34:41 +00:00 Compare
k900 changed title from WIP: Enable Mimir Alertmanager, add example alert to Enable Mimir Alertmanager, add example alert 2024-07-08 06:34:59 +00:00
k900 merged commit 8d2a367e92 into main 2024-07-08 06:35:33 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#33
No description provided.