Monitoring infra should alert when an expected collector can't be scraped #251

Open
opened 2025-08-05 14:59:32 +00:00 by delroth · 1 comment
Owner

When I deployed the smartctl exporter on bm-12 it at first couldn't start due to disk brokenness, then was broken due to a bug in the collector. In neither case did this show up anywhere in the monitoring, and the absence of metrics could be hiding other conditions. We should figure out how to bubble up the fact that the local agent can't scrape a target and alert on this.

When I deployed the smartctl exporter on bm-12 it at first couldn't start due to disk brokenness, then was broken due to a bug in the collector. In neither case did this show up anywhere in the monitoring, and the absence of metrics could be hiding other conditions. We should figure out how to bubble up the fact that the local agent can't scrape a target and alert on this.
Owner
              {
                alert = "FailedScrape";
                labels.severity = "warning";
                annotations.summary = "Scrape failed";
                annotations.description = "The job was not successfully scraped";
                for = "2m";
                expr = ''
                  up == 0
                '';
              }

could do the job

``` { alert = "FailedScrape"; labels.severity = "warning"; annotations.summary = "Scrape failed"; annotations.description = "The job was not successfully scraped"; for = "2m"; expr = '' up == 0 ''; } ``` could do the job
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#251
No description provided.