Commit graph

28 commits

Author SHA1 Message Date
Thomas Draebing 4a9d167637 Adapt receivecommits metrics
The metric name was changed in I1aae3bc0c0fe430086221503b8e529fa06967517.

Change-Id: I466b01f05a2f679ef49437998992f5aa678bd58c
2021-10-29 10:14:56 +02:00
Pat Long 37dc340371
Ensure that all dashboards are using 'defaults.datasource'
Some dashboards were still explicitly specifying 'Prometheus' as the
datasource, which leads to issues when trying to import the dashboards
into a grafana instance where the prometheus datasource has a different
name.

Change-Id: I13135af32a6f312a4feb32ab828f906f7b13edfe
2021-06-28 11:13:35 -04:00
Thomas Draebing 8e8a55e650 Add healthcheck ping and dashboard for Gerrit
The healthcheck plugin for Gerrit provides a convenient way to determine
the health of different functionalities and components of Gerrit. If
the endpoint provided by the plugin is pinged, it will execute a set
of checks and return either 200 if all checks passed or 500 if at least
one failed. It will also provide metrics that can be scraped by
Prometheus.

This change adds the option for Gerrit installations outside of Kubernetes
to install a sidecar container in the Prometheus deployment that every
30 s pings the healthcheck plugin's endpoint, thereby triggering the
checks. This is not provided for kubernetes, since there the ping should
be the task of the Kubernetes liveness probes.

The change additionally adds a dashboard displaying the status of the
healthcheck for each Gerrit instance over time.

Change-Id: Ieeedc4406b642e542c89679a8314d771ca0928af
2021-02-12 13:47:16 +01:00
Matthias Sohn 73d4326206 gc panel: use for loop to add prometheus targets
Grafonnet doesn't yet provide addSeriesOverrides() accepting an array.
Also use a different color for each gc so that switching to another
gc shows up in the graph.

Change-Id: I4e424280d44a63f57ad7196dfdb7e77ba2f13f24
2021-02-06 00:47:10 +01:00
Matthias Sohn c6a7a985cd Fix yAxis label of gc-time panel
Change-Id: Ib330398a5a9034ed34a07df50930aab2202b27d5
2021-02-05 01:04:06 +01:00
Matthias Sohn 7d9aff0488 Fix series override alias for G1 old gen gc metric
Change-Id: Ice4908d335214749989966219fad410c783652af
2021-02-05 01:01:32 +01:00
Matthias Sohn b72d83f48e Add gc metrics for ZGC and ShenandoahGC
Change-Id: I518a655f4c8080a8b5b23e67d6a518b503000949
2021-02-05 00:59:13 +01:00
Thomas Draebing f839c376af Convert overview dashboard to grafonnet
In addition this updates Grafonnet to include bar gauges.

Change-Id: I538bd965d52f841b24c9607fc97d5ac748b9d68b
2020-12-04 08:31:27 +01:00
Thomas Draebing 893b0c4f36 Convert replication dashboard to grafonnet
Change-Id: Icffb8ffbec8541e5b956487e5ce9ec54b3c8b617
2020-12-04 08:31:26 +01:00
Thomas Draebing c7c17679e9 Divide latency dashboard
There are a lot of latency metrics. This change splits up the existing
dashboard for latencies. For REST API latencies, it also allows to
select the REST API calls to look at. This change also adds latency
dashboards for the NoteDB and UI Actions.

Change-Id: Idb9631cc1bc838d06e626d58f163e71fb78b30c5
2020-12-04 08:31:26 +01:00
Thomas Draebing 0b4c16e881 Convert latency dashboard to grafonnet
Change-Id: Id97759996259eea802c80c2ef3261ba1883d92d3
2020-12-04 08:31:25 +01:00
Thomas Draebing 3e811f272b Convert git fetch/clone dashboard to grafonnet
Change-Id: I735f94599199ae2d0f304030fa023c55359e9a47
2020-12-04 08:31:25 +01:00
Thomas Draebing 12aba901e4 Extract yAxis object
Change-Id: I98c0708e521c0122beb53869242a3a1df8db3f3d
2020-12-04 08:31:24 +01:00
Thomas Draebing 82d9ead576 Convert caches dashboard to grafonnet
Change-Id: I42f10428bb5f85991cef2abbcdfab9424b8bb48d
2020-12-04 08:31:23 +01:00
Thomas Draebing 72391ac5e5 Convert queues dashboard to grafonnet
Change-Id: Ia3307a923b99ecacaaa8c803aa2af0c9bf4eabcb
2020-12-04 08:31:22 +01:00
Thomas Draebing ce5b8300f1 Start using Grafonnet to create Grafana dashboards
Versioning the pure JSON files representing the Grafana dashboards
had some disadvantages. It was hard to review them, they were very
cluttered and a lot was duplicated.

There are some tools that deal with that. One of them is Grafonnet,
which is a superset of Jsonnet, a tool to create JSON files using a
domain specific language.

This change implements the Gerrit Process dashboard in Grafonnet.
It also extends the installer to be able to install dashboards in
the Jsonnet format.

Change-Id: I6235fb7d045bd71557678a4e3b0d4ad4515f4615
2020-12-04 08:31:21 +01:00
Thomas Draebing bec7bf7897 Adapt dashboards to be accepted by Grafana dashboard repository
Grafana provides a repository for dashboards that can be used to easily
import dashboards. Providing these dashboards there would make it easier
for users not using the full setup provided here to still use the
dashboards. To be able to upload however, the datasource reference in the
dashboards has to be a template.

This is however not compatible with the way how the dashboards are imported
in the Grafana of the stack provided here. Thus, the variables are
removed during the installation.

Change-Id: I99f127882a6f7594ca1c40fbe1e299378e89f4e9
2020-11-27 10:40:09 +01:00
Thomas Draebing 65582f2deb Also monitor parallel GC
This change

- adds metrics for parallel GC to the GC panel in the Gerrit Process
  dashboard
- configures the GC panel to only show queries with values other than
  null
- changes the interval to one minute, which fits the scrape interval
- changes the default time frame to the last 24h, which is used for
  most other dashboards

Change-Id: I3b6587e769ae7486a02e26b8d7f2822319eb94e6
2020-08-25 13:20:11 +02:00
Thomas Draebing 451882b7e9 Allow to monitor Gerrit on Kubernetes
So far it was only possible to monitor single instance Gerrit servers.
This was due to to the fact that a URL had to be used that pointed to
a dedicated instance, since if multiple replicas would be behind the
instance, the metrics of a random replica would be scraped and not of
all.

Prometheus has a service discovery functionality for deployments running
in Kubernetes. This is now used, when monitoring a Gerrit instance in
Kubernetes. This allows to have a variable number of replicas running,
which will be automatically discovered by Prometheus.

The dashboards were adapted accordingly and allow now to select the
replica to be observed. For now, no summary of all replicas can be
displayed in the dashboards, but that feature is planned to be added
in the future.

Change-Id: I96efc63a192cd90f5e3e91a53dace8e1ae83132e
2020-05-14 15:55:35 +02:00
Thomas Draebing 7663baf7be Use gerrit_build_info metric to display Gerrit version
This replaces the hacky graph showing the Gerrit version with a table
showing the current Gerrit version information.

Change-Id: Idfbdc85e376953aead40fea06544e5c84fb777e7
2020-05-14 15:33:14 +02:00
Matthias Sohn e8b2651af2 Add latency dashboard
Add graphs for the following latency metrics
- receive-commit
- query total
- query changes
- REST total
- REST change list comments
- REST change list robot comments
- REST change post review
- REST get change detail
- REST get change diff
- REST get change
- REST get commit
- REST get change revision actions

Change-Id: Id782e12335ae76820cac4e4e8c80450671bf8216
2020-05-05 18:30:18 +02:00
Thomas Draebing f960eb5eab Add dashboard for Loki metrics
Change-Id: I220d90d33be3ed292402f3adb7386953cad7b0de
2020-04-03 11:56:24 +02:00
Thomas Draebing ff7fd22ca2 Add dashboard to monitor Prometheus data
This is an adapted version of this dashboard:
https://grafana.com/grafana/dashboards/3681

Change-Id: I405f09f75698b940becd6994a7fc457853603756
2020-04-03 11:56:24 +02:00
Thomas Draebing 442bf6fb98 Only show Gerrit instances in the instance dropdowns
A variable was used to select the Gerrit instance to observe in the
dashboards. Since the instance label is set for all targets that
prometheus scrapes, the variable would also contain e.g. the prometheus
instance.

Now only Gerrit instances are displayed by further filtering for a
metric specific for Gerrit.

Change-Id: I392b2ddf53a0ea49db25018dc5d37d269365812a
2020-04-03 11:37:27 +02:00
Thomas Draebing 623332e4b3 Create a configmap per dashboard
I the dashboard files got too large (>2Mb) Kubernetes was rejecting
the configmap.

Now each dashboard is installed with an own configmap. A sidecar container
is used to register these dashboards with Grafana.

Change-Id: I84062d6e2ac7dc2669945b54575bf239a25900a4
2020-03-26 09:55:39 +01:00
Matthias Sohn 14e7530aab Process dashboard: add panel showing system load
- Rearrange the other panels so that we show system load over cpu usage
over threads in the left column.
- Reduce height of memory panel a bit

Change-Id: Icaada525f87d0df503f67cf688b94d15a4119034
2020-03-13 17:41:01 +01:00
Matthias Sohn 4a96ed4947 Process dashboard: show number of available CPUs
Change-Id: Ifbf13edb2dfa8f5cee64aea3f9dca006d419ef20
2020-03-13 17:40:53 +01:00
Thomas Draebing be862d863e Move internal project to open source
This change adds the current status of a project that aims to create
a simple monitoring setup to monitor Gerrit servers, which was developed
internally at SAP.

The project provides an opinionated and basic configuration for helm
charts that can be used to install Loki, Prometheus and Grafana on a
Kubernetes cluster. Scripts to easily apply the configuration and
install the whole setup are provided as well.

The contributions so far were done by (with number of commits)

  80  Thomas Draebing
  11  Matthias Sohn
   2  Saša Živkov

Change-Id: I8045780446edfb3c0dc8287b8f494505e338e066
2020-03-11 15:23:19 +01:00