the-distro/gerrit-monitoring

Author	SHA1	Message	Date
Matthias Sohn	5423672a21	Make gerrit_monitoring.py executable Change-Id: `Id9ab768dc5d1f38e18079f01e381a10a629e627e`	2023-02-22 14:41:27 +01:00
Matthias Sohn	09eccc6e78	Fix name of gerrit_monitoring.py script in README.md Change-Id: `I1bf67dd6dcf54114db2796fdc8d32693ce684874`	2023-02-21 20:16:35 +01:00
Thomas Draebing	fad4eba966	Support a federated Prometheus setup Gerrit instances that are loadbalanced cannot easily e scraped by an external Prometheus, since the request won't end up at a specified Gerrit instance. A typical setup to solve this issue, is to install a local Prometheus and scrape the local Prometheus from the central Prometheus. This is a so called federated setup. Now such a setup is supported and can be configured. Change-Id: `I0119d3c1d846cd8e975e5732f4d59cf863c6d2b8`	2021-12-16 19:05:00 +01:00
Thomas Draebing	8e8a55e650	Add healthcheck ping and dashboard for Gerrit The healthcheck plugin for Gerrit provides a convenient way to determine the health of different functionalities and components of Gerrit. If the endpoint provided by the plugin is pinged, it will execute a set of checks and return either 200 if all checks passed or 500 if at least one failed. It will also provide metrics that can be scraped by Prometheus. This change adds the option for Gerrit installations outside of Kubernetes to install a sidecar container in the Prometheus deployment that every 30 s pings the healthcheck plugin's endpoint, thereby triggering the checks. This is not provided for kubernetes, since there the ping should be the task of the Kubernetes liveness probes. The change additionally adds a dashboard displaying the status of the healthcheck for each Gerrit instance over time. Change-Id: `Ieeedc4406b642e542c89679a8314d771ca0928af`	2021-02-12 13:47:16 +01:00
Thomas Draebing	ce5b8300f1	Start using Grafonnet to create Grafana dashboards Versioning the pure JSON files representing the Grafana dashboards had some disadvantages. It was hard to review them, they were very cluttered and a lot was duplicated. There are some tools that deal with that. One of them is Grafonnet, which is a superset of Jsonnet, a tool to create JSON files using a domain specific language. This change implements the Gerrit Process dashboard in Grafonnet. It also extends the installer to be able to install dashboards in the Jsonnet format. Change-Id: `I6235fb7d045bd71557678a4e3b0d4ad4515f4615`	2020-12-04 08:31:21 +01:00
Thomas Draebing	3b4005a047	Sort monitoring and logging components into sub-maps in the config This is done in preparation to allow multiple logging stacks. Change-Id: `I950200805ec01851bfdf6ccc3a5243893a947616`	2020-05-27 16:30:33 +02:00
Thomas Draebing	de8fee4f68	Add promtail chart to collect logs from cluster This adds the promtail chart to the installation that allows to collect the logs of the applications in the cluster, which are written to stdout of the containers. This will only collect logs from pods in the same namespace as the monitoring setup. In a later change also logs from Gerrit instances in Kubernetes will be added. Change-Id: `I86c5c5470eaa31191fb5ac339ee21dee85106097`	2020-05-27 16:30:31 +02:00
Thomas Draebing	451882b7e9	Allow to monitor Gerrit on Kubernetes So far it was only possible to monitor single instance Gerrit servers. This was due to to the fact that a URL had to be used that pointed to a dedicated instance, since if multiple replicas would be behind the instance, the metrics of a random replica would be scraped and not of all. Prometheus has a service discovery functionality for deployments running in Kubernetes. This is now used, when monitoring a Gerrit instance in Kubernetes. This allows to have a variable number of replicas running, which will be automatically discovered by Prometheus. The dashboards were adapted accordingly and allow now to select the replica to be observed. For now, no summary of all replicas can be displayed in the dashboards, but that feature is planned to be added in the future. Change-Id: `I96efc63a192cd90f5e3e91a53dace8e1ae83132e`	2020-05-14 15:55:35 +02:00
Thomas Draebing	0bdb1d02e0	Create promtail config per Gerrit host So far the install-script could only create a single promtail config. Since the monitoring setup is able to monitor multiple Gerrit servers, this caused manual work to create a promtail config per Gerrit server. Now ytt will create a configuration for each Gerrit host configured in the config.yaml. Ytt is only able to do that in a single file. Thus, csplit is used to split the files into separate files that can then be used to configure promtail on the respective hosts. The config- files can then be found under $OUTPUT/promtail/promtail-$GERRIT_HOSTNAME.yaml. Change-Id: `Ib09fba83d8a8fbd45b42e9e5388a85a37ab1a952`	2020-04-16 14:25:53 +02:00
Thomas Draebing	6b75c12831	Rewrite the scripts in python The scripts were written in bash. Using bash became quite unwieldy. Python by nature can deal well with yaml and is thus better suited in dealing with the yaml-based configuration files. This change rewrites the original scripts staying as close as possible to the original ones. Right now, the python scripts call subprocesses a lot to work with the tools, which were already used before. At least for yaml- templating there may be better tools that have a python integration, which could be used in the future. Change-Id: `Ida16318445a05dcfdada9c7a56a391e4827f02e7`	2020-04-16 14:25:50 +02:00
Thomas Draebing	aa0c5252f0	Describe infrastructure dependencies Change-Id: `I1ba3967a10e5cd35aff60579eff388252c81874b`	2020-03-24 16:01:36 +01:00
Thomas Draebing	eb4e6ea191	Use object store to store chunks created by Loki The chunks created by Loki were stored in a persistent volume. This does not scale well, since volumes cannot easily be resized in Kubernetes. Also, at least the ext4-filesystem had issues, when large numbers of logs were saved. These issues are due to the dir_index as discussed in [1]. An object store provides a more scalable and cheaper solution. Loki supports S3 as an object storage and also other object stores that understand the S3 API like Ceph or OpenStack Swift. [1] https://github.com/grafana/loki/issues/1502 Change-Id: `Id55095c3b6659f40708712c1a494753dbcab7686`	2020-03-24 16:01:34 +01:00
Thomas Draebing	be862d863e	Move internal project to open source This change adds the current status of a project that aims to create a simple monitoring setup to monitor Gerrit servers, which was developed internally at SAP. The project provides an opinionated and basic configuration for helm charts that can be used to install Loki, Prometheus and Grafana on a Kubernetes cluster. Scripts to easily apply the configuration and install the whole setup are provided as well. The contributions so far were done by (with number of commits) 80 Thomas Draebing 11 Matthias Sohn 2 Saša Živkov Change-Id: `I8045780446edfb3c0dc8287b8f494505e338e066`	2020-03-11 15:23:19 +01:00

13 commits