the-distro/gerrit-monitoring

Author	SHA1	Message	Date
Thomas Draebing	4a9d167637	Adapt receivecommits metrics The metric name was changed in `I1aae3bc0c0fe430086221503b8e529fa06967517`. Change-Id: `I466b01f05a2f679ef49437998992f5aa678bd58c`	2021-10-29 10:14:56 +02:00
Pat Long	37dc340371	Ensure that all dashboards are using 'defaults.datasource' Some dashboards were still explicitly specifying 'Prometheus' as the datasource, which leads to issues when trying to import the dashboards into a grafana instance where the prometheus datasource has a different name. Change-Id: `I13135af32a6f312a4feb32ab828f906f7b13edfe`	2021-06-28 11:13:35 -04:00
Thomas Draebing	8e8a55e650	Add healthcheck ping and dashboard for Gerrit The healthcheck plugin for Gerrit provides a convenient way to determine the health of different functionalities and components of Gerrit. If the endpoint provided by the plugin is pinged, it will execute a set of checks and return either 200 if all checks passed or 500 if at least one failed. It will also provide metrics that can be scraped by Prometheus. This change adds the option for Gerrit installations outside of Kubernetes to install a sidecar container in the Prometheus deployment that every 30 s pings the healthcheck plugin's endpoint, thereby triggering the checks. This is not provided for kubernetes, since there the ping should be the task of the Kubernetes liveness probes. The change additionally adds a dashboard displaying the status of the healthcheck for each Gerrit instance over time. Change-Id: `Ieeedc4406b642e542c89679a8314d771ca0928af`	2021-02-12 13:47:16 +01:00
Matthias Sohn	73d4326206	gc panel: use for loop to add prometheus targets Grafonnet doesn't yet provide addSeriesOverrides() accepting an array. Also use a different color for each gc so that switching to another gc shows up in the graph. Change-Id: `I4e424280d44a63f57ad7196dfdb7e77ba2f13f24`	2021-02-06 00:47:10 +01:00
Matthias Sohn	c6a7a985cd	Fix yAxis label of gc-time panel Change-Id: `Ib330398a5a9034ed34a07df50930aab2202b27d5`	2021-02-05 01:04:06 +01:00
Matthias Sohn	7d9aff0488	Fix series override alias for G1 old gen gc metric Change-Id: `Ice4908d335214749989966219fad410c783652af`	2021-02-05 01:01:32 +01:00
Matthias Sohn	b72d83f48e	Add gc metrics for ZGC and ShenandoahGC Change-Id: `I518a655f4c8080a8b5b23e67d6a518b503000949`	2021-02-05 00:59:13 +01:00
Thomas Draebing	f839c376af	Convert overview dashboard to grafonnet In addition this updates Grafonnet to include bar gauges. Change-Id: `I538bd965d52f841b24c9607fc97d5ac748b9d68b`	2020-12-04 08:31:27 +01:00
Thomas Draebing	893b0c4f36	Convert replication dashboard to grafonnet Change-Id: `Icffb8ffbec8541e5b956487e5ce9ec54b3c8b617`	2020-12-04 08:31:26 +01:00
Thomas Draebing	c7c17679e9	Divide latency dashboard There are a lot of latency metrics. This change splits up the existing dashboard for latencies. For REST API latencies, it also allows to select the REST API calls to look at. This change also adds latency dashboards for the NoteDB and UI Actions. Change-Id: `Idb9631cc1bc838d06e626d58f163e71fb78b30c5`	2020-12-04 08:31:26 +01:00
Thomas Draebing	0b4c16e881	Convert latency dashboard to grafonnet Change-Id: `Id97759996259eea802c80c2ef3261ba1883d92d3`	2020-12-04 08:31:25 +01:00
Thomas Draebing	3e811f272b	Convert git fetch/clone dashboard to grafonnet Change-Id: `I735f94599199ae2d0f304030fa023c55359e9a47`	2020-12-04 08:31:25 +01:00
Thomas Draebing	12aba901e4	Extract yAxis object Change-Id: `I98c0708e521c0122beb53869242a3a1df8db3f3d`	2020-12-04 08:31:24 +01:00
Thomas Draebing	82d9ead576	Convert caches dashboard to grafonnet Change-Id: `I42f10428bb5f85991cef2abbcdfab9424b8bb48d`	2020-12-04 08:31:23 +01:00
Thomas Draebing	72391ac5e5	Convert queues dashboard to grafonnet Change-Id: `Ia3307a923b99ecacaaa8c803aa2af0c9bf4eabcb`	2020-12-04 08:31:22 +01:00
Thomas Draebing	ce5b8300f1	Start using Grafonnet to create Grafana dashboards Versioning the pure JSON files representing the Grafana dashboards had some disadvantages. It was hard to review them, they were very cluttered and a lot was duplicated. There are some tools that deal with that. One of them is Grafonnet, which is a superset of Jsonnet, a tool to create JSON files using a domain specific language. This change implements the Gerrit Process dashboard in Grafonnet. It also extends the installer to be able to install dashboards in the Jsonnet format. Change-Id: `I6235fb7d045bd71557678a4e3b0d4ad4515f4615`	2020-12-04 08:31:21 +01:00
Thomas Draebing	bec7bf7897	Adapt dashboards to be accepted by Grafana dashboard repository Grafana provides a repository for dashboards that can be used to easily import dashboards. Providing these dashboards there would make it easier for users not using the full setup provided here to still use the dashboards. To be able to upload however, the datasource reference in the dashboards has to be a template. This is however not compatible with the way how the dashboards are imported in the Grafana of the stack provided here. Thus, the variables are removed during the installation. Change-Id: `I99f127882a6f7594ca1c40fbe1e299378e89f4e9`	2020-11-27 10:40:09 +01:00
Thomas Draebing	65582f2deb	Also monitor parallel GC This change - adds metrics for parallel GC to the GC panel in the Gerrit Process dashboard - configures the GC panel to only show queries with values other than null - changes the interval to one minute, which fits the scrape interval - changes the default time frame to the last 24h, which is used for most other dashboards Change-Id: `I3b6587e769ae7486a02e26b8d7f2822319eb94e6`	2020-08-25 13:20:11 +02:00
Thomas Draebing	451882b7e9	Allow to monitor Gerrit on Kubernetes So far it was only possible to monitor single instance Gerrit servers. This was due to to the fact that a URL had to be used that pointed to a dedicated instance, since if multiple replicas would be behind the instance, the metrics of a random replica would be scraped and not of all. Prometheus has a service discovery functionality for deployments running in Kubernetes. This is now used, when monitoring a Gerrit instance in Kubernetes. This allows to have a variable number of replicas running, which will be automatically discovered by Prometheus. The dashboards were adapted accordingly and allow now to select the replica to be observed. For now, no summary of all replicas can be displayed in the dashboards, but that feature is planned to be added in the future. Change-Id: `I96efc63a192cd90f5e3e91a53dace8e1ae83132e`	2020-05-14 15:55:35 +02:00
Thomas Draebing	7663baf7be	Use gerrit_build_info metric to display Gerrit version This replaces the hacky graph showing the Gerrit version with a table showing the current Gerrit version information. Change-Id: `Idfbdc85e376953aead40fea06544e5c84fb777e7`	2020-05-14 15:33:14 +02:00
Matthias Sohn	e8b2651af2	Add latency dashboard Add graphs for the following latency metrics - receive-commit - query total - query changes - REST total - REST change list comments - REST change list robot comments - REST change post review - REST get change detail - REST get change diff - REST get change - REST get commit - REST get change revision actions Change-Id: `Id782e12335ae76820cac4e4e8c80450671bf8216`	2020-05-05 18:30:18 +02:00
Thomas Draebing	f960eb5eab	Add dashboard for Loki metrics Change-Id: `I220d90d33be3ed292402f3adb7386953cad7b0de`	2020-04-03 11:56:24 +02:00
Thomas Draebing	ff7fd22ca2	Add dashboard to monitor Prometheus data This is an adapted version of this dashboard: https://grafana.com/grafana/dashboards/3681 Change-Id: `I405f09f75698b940becd6994a7fc457853603756`	2020-04-03 11:56:24 +02:00
Thomas Draebing	442bf6fb98	Only show Gerrit instances in the instance dropdowns A variable was used to select the Gerrit instance to observe in the dashboards. Since the instance label is set for all targets that prometheus scrapes, the variable would also contain e.g. the prometheus instance. Now only Gerrit instances are displayed by further filtering for a metric specific for Gerrit. Change-Id: `I392b2ddf53a0ea49db25018dc5d37d269365812a`	2020-04-03 11:37:27 +02:00
Thomas Draebing	623332e4b3	Create a configmap per dashboard I the dashboard files got too large (>2Mb) Kubernetes was rejecting the configmap. Now each dashboard is installed with an own configmap. A sidecar container is used to register these dashboards with Grafana. Change-Id: `I84062d6e2ac7dc2669945b54575bf239a25900a4`	2020-03-26 09:55:39 +01:00
Matthias Sohn	14e7530aab	Process dashboard: add panel showing system load - Rearrange the other panels so that we show system load over cpu usage over threads in the left column. - Reduce height of memory panel a bit Change-Id: `Icaada525f87d0df503f67cf688b94d15a4119034`	2020-03-13 17:41:01 +01:00
Matthias Sohn	4a96ed4947	Process dashboard: show number of available CPUs Change-Id: `Ifbf13edb2dfa8f5cee64aea3f9dca006d419ef20`	2020-03-13 17:40:53 +01:00
Thomas Draebing	be862d863e	Move internal project to open source This change adds the current status of a project that aims to create a simple monitoring setup to monitor Gerrit servers, which was developed internally at SAP. The project provides an opinionated and basic configuration for helm charts that can be used to install Loki, Prometheus and Grafana on a Kubernetes cluster. Scripts to easily apply the configuration and install the whole setup are provided as well. The contributions so far were done by (with number of commits) 80 Thomas Draebing 11 Matthias Sohn 2 Saša Živkov Change-Id: `I8045780446edfb3c0dc8287b8f494505e338e066`	2020-03-11 15:23:19 +01:00

28 commits