Some dashboards were still explicitly specifying 'Prometheus' as the
datasource, which leads to issues when trying to import the dashboards
into a grafana instance where the prometheus datasource has a different
name.
Change-Id: I13135af32a6f312a4feb32ab828f906f7b13edfe
The healthcheck plugin for Gerrit provides a convenient way to determine
the health of different functionalities and components of Gerrit. If
the endpoint provided by the plugin is pinged, it will execute a set
of checks and return either 200 if all checks passed or 500 if at least
one failed. It will also provide metrics that can be scraped by
Prometheus.
This change adds the option for Gerrit installations outside of Kubernetes
to install a sidecar container in the Prometheus deployment that every
30 s pings the healthcheck plugin's endpoint, thereby triggering the
checks. This is not provided for kubernetes, since there the ping should
be the task of the Kubernetes liveness probes.
The change additionally adds a dashboard displaying the status of the
healthcheck for each Gerrit instance over time.
Change-Id: Ieeedc4406b642e542c89679a8314d771ca0928af
Grafonnet doesn't yet provide addSeriesOverrides() accepting an array.
Also use a different color for each gc so that switching to another
gc shows up in the graph.
Change-Id: I4e424280d44a63f57ad7196dfdb7e77ba2f13f24
There are a lot of latency metrics. This change splits up the existing
dashboard for latencies. For REST API latencies, it also allows to
select the REST API calls to look at. This change also adds latency
dashboards for the NoteDB and UI Actions.
Change-Id: Idb9631cc1bc838d06e626d58f163e71fb78b30c5
Versioning the pure JSON files representing the Grafana dashboards
had some disadvantages. It was hard to review them, they were very
cluttered and a lot was duplicated.
There are some tools that deal with that. One of them is Grafonnet,
which is a superset of Jsonnet, a tool to create JSON files using a
domain specific language.
This change implements the Gerrit Process dashboard in Grafonnet.
It also extends the installer to be able to install dashboards in
the Jsonnet format.
Change-Id: I6235fb7d045bd71557678a4e3b0d4ad4515f4615
Grafana provides a repository for dashboards that can be used to easily
import dashboards. Providing these dashboards there would make it easier
for users not using the full setup provided here to still use the
dashboards. To be able to upload however, the datasource reference in the
dashboards has to be a template.
This is however not compatible with the way how the dashboards are imported
in the Grafana of the stack provided here. Thus, the variables are
removed during the installation.
Change-Id: I99f127882a6f7594ca1c40fbe1e299378e89f4e9
This change
- adds metrics for parallel GC to the GC panel in the Gerrit Process
dashboard
- configures the GC panel to only show queries with values other than
null
- changes the interval to one minute, which fits the scrape interval
- changes the default time frame to the last 24h, which is used for
most other dashboards
Change-Id: I3b6587e769ae7486a02e26b8d7f2822319eb94e6
So far it was only possible to monitor single instance Gerrit servers.
This was due to to the fact that a URL had to be used that pointed to
a dedicated instance, since if multiple replicas would be behind the
instance, the metrics of a random replica would be scraped and not of
all.
Prometheus has a service discovery functionality for deployments running
in Kubernetes. This is now used, when monitoring a Gerrit instance in
Kubernetes. This allows to have a variable number of replicas running,
which will be automatically discovered by Prometheus.
The dashboards were adapted accordingly and allow now to select the
replica to be observed. For now, no summary of all replicas can be
displayed in the dashboards, but that feature is planned to be added
in the future.
Change-Id: I96efc63a192cd90f5e3e91a53dace8e1ae83132e
This replaces the hacky graph showing the Gerrit version with a table
showing the current Gerrit version information.
Change-Id: Idfbdc85e376953aead40fea06544e5c84fb777e7
Add graphs for the following latency metrics
- receive-commit
- query total
- query changes
- REST total
- REST change list comments
- REST change list robot comments
- REST change post review
- REST get change detail
- REST get change diff
- REST get change
- REST get commit
- REST get change revision actions
Change-Id: Id782e12335ae76820cac4e4e8c80450671bf8216
A variable was used to select the Gerrit instance to observe in the
dashboards. Since the instance label is set for all targets that
prometheus scrapes, the variable would also contain e.g. the prometheus
instance.
Now only Gerrit instances are displayed by further filtering for a
metric specific for Gerrit.
Change-Id: I392b2ddf53a0ea49db25018dc5d37d269365812a
I the dashboard files got too large (>2Mb) Kubernetes was rejecting
the configmap.
Now each dashboard is installed with an own configmap. A sidecar container
is used to register these dashboards with Grafana.
Change-Id: I84062d6e2ac7dc2669945b54575bf239a25900a4
- Rearrange the other panels so that we show system load over cpu usage
over threads in the left column.
- Reduce height of memory panel a bit
Change-Id: Icaada525f87d0df503f67cf688b94d15a4119034
This change adds the current status of a project that aims to create
a simple monitoring setup to monitor Gerrit servers, which was developed
internally at SAP.
The project provides an opinionated and basic configuration for helm
charts that can be used to install Loki, Prometheus and Grafana on a
Kubernetes cluster. Scripts to easily apply the configuration and
install the whole setup are provided as well.
The contributions so far were done by (with number of commits)
80 Thomas Draebing
11 Matthias Sohn
2 Saša Živkov
Change-Id: I8045780446edfb3c0dc8287b8f494505e338e066