Bring the BMC metrics over our Grafana #185

Open
opened 2025-03-05 18:30:05 +00:00 by raito · 0 comments
Owner

We had a weird incident where some voltage lines on the ARM64 motherboard were down temporarily causing a freeze of bm-11.

This was fixed by me intervening by shutting down the machine, waiting for the lines to come up, then restarting. This is very weird and hard to debug properly.

To let other people be aware of such an issue, we should scrape the OpenBMC metrics in some fashion.

Currently, OpenBMC access for the ARM64 box requires usage of a SOCKS5 proxy, we could analyze an architecture on how to exfil all the metrics we need and push them.

This task is high priority and may require custom development. Please ping me if you want to take it.

We had a weird incident where some voltage lines on the ARM64 motherboard were down temporarily causing a freeze of bm-11. This was fixed by me intervening by shutting down the machine, waiting for the lines to come up, then restarting. This is very weird and hard to debug properly. To let other people be aware of such an issue, we should scrape the OpenBMC metrics in some fashion. Currently, OpenBMC access for the ARM64 box requires usage of a SOCKS5 proxy, we could analyze an architecture on how to exfil all the metrics we need and push them. This task is high priority and may require custom development. Please ping me if you want to take it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#185
No description provided.