Commit graph

7 commits

Author SHA1 Message Date
Thomas Draebing fad4eba966 Support a federated Prometheus setup
Gerrit instances that are loadbalanced cannot easily e scraped by
an external Prometheus, since the request won't end up at a specified
Gerrit instance. A typical setup to solve this issue, is to install a
local Prometheus and scrape the local Prometheus from the central
Prometheus. This is a so called federated setup.

Now such a setup is supported and can be configured.

Change-Id: I0119d3c1d846cd8e975e5732f4d59cf863c6d2b8
2021-12-16 19:05:00 +01:00
Thomas Draebing 8e8a55e650 Add healthcheck ping and dashboard for Gerrit
The healthcheck plugin for Gerrit provides a convenient way to determine
the health of different functionalities and components of Gerrit. If
the endpoint provided by the plugin is pinged, it will execute a set
of checks and return either 200 if all checks passed or 500 if at least
one failed. It will also provide metrics that can be scraped by
Prometheus.

This change adds the option for Gerrit installations outside of Kubernetes
to install a sidecar container in the Prometheus deployment that every
30 s pings the healthcheck plugin's endpoint, thereby triggering the
checks. This is not provided for kubernetes, since there the ping should
be the task of the Kubernetes liveness probes.

The change additionally adds a dashboard displaying the status of the
healthcheck for each Gerrit instance over time.

Change-Id: Ieeedc4406b642e542c89679a8314d771ca0928af
2021-02-12 13:47:16 +01:00
Thomas Draebing 3b4005a047 Sort monitoring and logging components into sub-maps in the config
This is done in preparation to allow multiple logging stacks.

Change-Id: I950200805ec01851bfdf6ccc3a5243893a947616
2020-05-27 16:30:33 +02:00
Thomas Draebing 451882b7e9 Allow to monitor Gerrit on Kubernetes
So far it was only possible to monitor single instance Gerrit servers.
This was due to to the fact that a URL had to be used that pointed to
a dedicated instance, since if multiple replicas would be behind the
instance, the metrics of a random replica would be scraped and not of
all.

Prometheus has a service discovery functionality for deployments running
in Kubernetes. This is now used, when monitoring a Gerrit instance in
Kubernetes. This allows to have a variable number of replicas running,
which will be automatically discovered by Prometheus.

The dashboards were adapted accordingly and allow now to select the
replica to be observed. For now, no summary of all replicas can be
displayed in the dashboards, but that feature is planned to be added
in the future.

Change-Id: I96efc63a192cd90f5e3e91a53dace8e1ae83132e
2020-05-14 15:55:35 +02:00
Thomas Draebing 0bdb1d02e0 Create promtail config per Gerrit host
So far the install-script could only create a single promtail config.
Since the monitoring setup is able to monitor multiple Gerrit servers,
this caused manual work to create a promtail config per Gerrit server.

Now ytt will create a configuration for each Gerrit host configured
in the config.yaml. Ytt is only able to do that in a single file. Thus,
csplit is used to split the files into separate files that can then
be used to configure promtail on the respective hosts. The config-
files can then be found under
$OUTPUT/promtail/promtail-$GERRIT_HOSTNAME.yaml.

Change-Id: Ib09fba83d8a8fbd45b42e9e5388a85a37ab1a952
2020-04-16 14:25:53 +02:00
Thomas Draebing eb4e6ea191 Use object store to store chunks created by Loki
The chunks created by Loki were stored in a persistent volume. This
does not scale well, since volumes cannot easily be resized in
Kubernetes. Also, at least the ext4-filesystem had issues, when large
numbers of logs were saved. These issues are due to the dir_index as
discussed in [1].

An object store provides a more scalable and cheaper solution. Loki
supports S3 as an object storage and also other object stores that
understand the S3 API like Ceph or OpenStack Swift.

[1] https://github.com/grafana/loki/issues/1502

Change-Id: Id55095c3b6659f40708712c1a494753dbcab7686
2020-03-24 16:01:34 +01:00
Thomas Draebing be862d863e Move internal project to open source
This change adds the current status of a project that aims to create
a simple monitoring setup to monitor Gerrit servers, which was developed
internally at SAP.

The project provides an opinionated and basic configuration for helm
charts that can be used to install Loki, Prometheus and Grafana on a
Kubernetes cluster. Scripts to easily apply the configuration and
install the whole setup are provided as well.

The contributions so far were done by (with number of commits)

  80  Thomas Draebing
  11  Matthias Sohn
   2  Saša Živkov

Change-Id: I8045780446edfb3c0dc8287b8f494505e338e066
2020-03-11 15:23:19 +01:00