the-distro/gerrit-monitoring

Author	SHA1	Message	Date
Thomas Draebing	fad4eba966	Support a federated Prometheus setup Gerrit instances that are loadbalanced cannot easily e scraped by an external Prometheus, since the request won't end up at a specified Gerrit instance. A typical setup to solve this issue, is to install a local Prometheus and scrape the local Prometheus from the central Prometheus. This is a so called federated setup. Now such a setup is supported and can be configured. Change-Id: `I0119d3c1d846cd8e975e5732f4d59cf863c6d2b8`	2021-12-16 19:05:00 +01:00
Thomas Draebing	8e8a55e650	Add healthcheck ping and dashboard for Gerrit The healthcheck plugin for Gerrit provides a convenient way to determine the health of different functionalities and components of Gerrit. If the endpoint provided by the plugin is pinged, it will execute a set of checks and return either 200 if all checks passed or 500 if at least one failed. It will also provide metrics that can be scraped by Prometheus. This change adds the option for Gerrit installations outside of Kubernetes to install a sidecar container in the Prometheus deployment that every 30 s pings the healthcheck plugin's endpoint, thereby triggering the checks. This is not provided for kubernetes, since there the ping should be the task of the Kubernetes liveness probes. The change additionally adds a dashboard displaying the status of the healthcheck for each Gerrit instance over time. Change-Id: `Ieeedc4406b642e542c89679a8314d771ca0928af`	2021-02-12 13:47:16 +01:00
Thomas Draebing	3b4005a047	Sort monitoring and logging components into sub-maps in the config This is done in preparation to allow multiple logging stacks. Change-Id: `I950200805ec01851bfdf6ccc3a5243893a947616`	2020-05-27 16:30:33 +02:00
Thomas Draebing	451882b7e9	Allow to monitor Gerrit on Kubernetes So far it was only possible to monitor single instance Gerrit servers. This was due to to the fact that a URL had to be used that pointed to a dedicated instance, since if multiple replicas would be behind the instance, the metrics of a random replica would be scraped and not of all. Prometheus has a service discovery functionality for deployments running in Kubernetes. This is now used, when monitoring a Gerrit instance in Kubernetes. This allows to have a variable number of replicas running, which will be automatically discovered by Prometheus. The dashboards were adapted accordingly and allow now to select the replica to be observed. For now, no summary of all replicas can be displayed in the dashboards, but that feature is planned to be added in the future. Change-Id: `I96efc63a192cd90f5e3e91a53dace8e1ae83132e`	2020-05-14 15:55:35 +02:00
Thomas Draebing	0bdb1d02e0	Create promtail config per Gerrit host So far the install-script could only create a single promtail config. Since the monitoring setup is able to monitor multiple Gerrit servers, this caused manual work to create a promtail config per Gerrit server. Now ytt will create a configuration for each Gerrit host configured in the config.yaml. Ytt is only able to do that in a single file. Thus, csplit is used to split the files into separate files that can then be used to configure promtail on the respective hosts. The config- files can then be found under $OUTPUT/promtail/promtail-$GERRIT_HOSTNAME.yaml. Change-Id: `Ib09fba83d8a8fbd45b42e9e5388a85a37ab1a952`	2020-04-16 14:25:53 +02:00
Thomas Draebing	eb4e6ea191	Use object store to store chunks created by Loki The chunks created by Loki were stored in a persistent volume. This does not scale well, since volumes cannot easily be resized in Kubernetes. Also, at least the ext4-filesystem had issues, when large numbers of logs were saved. These issues are due to the dir_index as discussed in [1]. An object store provides a more scalable and cheaper solution. Loki supports S3 as an object storage and also other object stores that understand the S3 API like Ceph or OpenStack Swift. [1] https://github.com/grafana/loki/issues/1502 Change-Id: `Id55095c3b6659f40708712c1a494753dbcab7686`	2020-03-24 16:01:34 +01:00
Thomas Draebing	be862d863e	Move internal project to open source This change adds the current status of a project that aims to create a simple monitoring setup to monitor Gerrit servers, which was developed internally at SAP. The project provides an opinionated and basic configuration for helm charts that can be used to install Loki, Prometheus and Grafana on a Kubernetes cluster. Scripts to easily apply the configuration and install the whole setup are provided as well. The contributions so far were done by (with number of commits) 80 Thomas Draebing 11 Matthias Sohn 2 Saša Živkov Change-Id: `I8045780446edfb3c0dc8287b8f494505e338e066`	2020-03-11 15:23:19 +01:00

7 commits