the-distro/gerrit-monitoring

Author	SHA1	Message	Date
raito	e544abac81	feat: parametrize the datasource Signed-off-by: Raito Bezarius <masterancpp@gmail.com>	2024-08-24 16:25:18 +02:00
raito	d6badc6472	feat: rebase import paths for grafonnet Signed-off-by: Raito Bezarius <masterancpp@gmail.com>	2024-08-24 11:25:26 +02:00
raito	2fea8de1ec	feat: all phases! Signed-off-by: Raito Bezarius <masterancpp@gmail.com>	2024-08-21 19:57:21 +02:00
Thomas Draebing	c990d0e853	Update python to 3.11 This also removes the fixed versions in the Pipfile. Since versions are already locked in the Pipfile.lock, this was not necessary and only made things harder to manage. Change-Id: `I3ce60de4178f9647b82af3c32800bca5f369a456`	2023-10-09 08:32:13 +02:00
Thomas Draebing	517332653f	Disable PodSecurityPolicies by default to support Kubernetes 1.25+ PodSecurityPolicies were removed in favour of Pod security standards that are configured on a cluster or namespace level [1]. [1] https://kubernetes.io/blog/2022/08/25/pod-security-admission-stable/ Change-Id: `Ia1e55c09bfad30fd209e96b3eddbda339edc31aa`	2023-07-12 12:58:29 +00:00
Matthias Sohn	5423672a21	Make gerrit_monitoring.py executable Change-Id: `Id9ab768dc5d1f38e18079f01e381a10a629e627e`	2023-02-22 14:41:27 +01:00
Matthias Sohn	2ea9735067	Add shebang to gerrit_monitoring.py script This allows to run the script without explicitly specifying the interpreter to use. Change-Id: `I900b2dae90a87fb6bae65c6d1549ad9d5d29cd48`	2023-02-21 20:18:25 +01:00
Matthias Sohn	09eccc6e78	Fix name of gerrit_monitoring.py script in README.md Change-Id: `I1bf67dd6dcf54114db2796fdc8d32693ce684874`	2023-02-21 20:16:35 +01:00
Thomas Draebing	7088daaa31	Add option to use vault to manage key used for encryption Using a local PGP-key for encryption of the secrets in the configuration is not very secure and makes it hard to rotate and distribute the key. Sops provides the option to use managed services for this purpose, e.g. HashiCorp Vault. This change adds the option to use HashiCorp Vault, when using the provided python scripts to encrypt the config file. Change-Id: `I7683fbfdbed00506c3bca264ac8565f48bc5ea73`	2022-05-09 06:59:40 +00:00
Thomas Draebing	fad4eba966	Support a federated Prometheus setup Gerrit instances that are loadbalanced cannot easily e scraped by an external Prometheus, since the request won't end up at a specified Gerrit instance. A typical setup to solve this issue, is to install a local Prometheus and scrape the local Prometheus from the central Prometheus. This is a so called federated setup. Now such a setup is supported and can be configured. Change-Id: `I0119d3c1d846cd8e975e5732f4d59cf863c6d2b8`	2021-12-16 19:05:00 +01:00
Thomas Draebing	4a9d167637	Adapt receivecommits metrics The metric name was changed in `I1aae3bc0c0fe430086221503b8e529fa06967517`. Change-Id: `I466b01f05a2f679ef49437998992f5aa678bd58c`	2021-10-29 10:14:56 +02:00
Pat Long	37dc340371	Ensure that all dashboards are using 'defaults.datasource' Some dashboards were still explicitly specifying 'Prometheus' as the datasource, which leads to issues when trying to import the dashboards into a grafana instance where the prometheus datasource has a different name. Change-Id: `I13135af32a6f312a4feb32ab828f906f7b13edfe`	2021-06-28 11:13:35 -04:00
Thomas Draebing	8e8a55e650	Add healthcheck ping and dashboard for Gerrit The healthcheck plugin for Gerrit provides a convenient way to determine the health of different functionalities and components of Gerrit. If the endpoint provided by the plugin is pinged, it will execute a set of checks and return either 200 if all checks passed or 500 if at least one failed. It will also provide metrics that can be scraped by Prometheus. This change adds the option for Gerrit installations outside of Kubernetes to install a sidecar container in the Prometheus deployment that every 30 s pings the healthcheck plugin's endpoint, thereby triggering the checks. This is not provided for kubernetes, since there the ping should be the task of the Kubernetes liveness probes. The change additionally adds a dashboard displaying the status of the healthcheck for each Gerrit instance over time. Change-Id: `Ieeedc4406b642e542c89679a8314d771ca0928af`	2021-02-12 13:47:16 +01:00
Thomas Draebing	6813b84a99	Update Grafana helm-chart to 6.2.2 (Grafana 7.3.5) Change-Id: `Iec16e455cbdea3bc83bb7970dd6cdfbfaf701ffb`	2021-02-09 15:09:26 +01:00
Matthias Sohn	73d4326206	gc panel: use for loop to add prometheus targets Grafonnet doesn't yet provide addSeriesOverrides() accepting an array. Also use a different color for each gc so that switching to another gc shows up in the graph. Change-Id: `I4e424280d44a63f57ad7196dfdb7e77ba2f13f24`	2021-02-06 00:47:10 +01:00
Matthias Sohn	c6a7a985cd	Fix yAxis label of gc-time panel Change-Id: `Ib330398a5a9034ed34a07df50930aab2202b27d5`	2021-02-05 01:04:06 +01:00
Matthias Sohn	7d9aff0488	Fix series override alias for G1 old gen gc metric Change-Id: `Ice4908d335214749989966219fad410c783652af`	2021-02-05 01:01:32 +01:00
Matthias Sohn	b72d83f48e	Add gc metrics for ZGC and ShenandoahGC Change-Id: `I518a655f4c8080a8b5b23e67d6a518b503000949`	2021-02-05 00:59:13 +01:00
Thomas Draebing	f839c376af	Convert overview dashboard to grafonnet In addition this updates Grafonnet to include bar gauges. Change-Id: `I538bd965d52f841b24c9607fc97d5ac748b9d68b`	2020-12-04 08:31:27 +01:00
Thomas Draebing	7e3e4b76c5	Update Grafana chart This updates the Grafana chart to the new repository, since the old repository is now deprecated. This also updates the container images and Grafana version. Change-Id: `I29e38d7c23bfa95992537efae7b8b3967d71ffd0`	2020-12-04 08:31:26 +01:00
Thomas Draebing	893b0c4f36	Convert replication dashboard to grafonnet Change-Id: `Icffb8ffbec8541e5b956487e5ce9ec54b3c8b617`	2020-12-04 08:31:26 +01:00
Thomas Draebing	c7c17679e9	Divide latency dashboard There are a lot of latency metrics. This change splits up the existing dashboard for latencies. For REST API latencies, it also allows to select the REST API calls to look at. This change also adds latency dashboards for the NoteDB and UI Actions. Change-Id: `Idb9631cc1bc838d06e626d58f163e71fb78b30c5`	2020-12-04 08:31:26 +01:00
Thomas Draebing	0b4c16e881	Convert latency dashboard to grafonnet Change-Id: `Id97759996259eea802c80c2ef3261ba1883d92d3`	2020-12-04 08:31:25 +01:00
Thomas Draebing	3e811f272b	Convert git fetch/clone dashboard to grafonnet Change-Id: `I735f94599199ae2d0f304030fa023c55359e9a47`	2020-12-04 08:31:25 +01:00
Thomas Draebing	12aba901e4	Extract yAxis object Change-Id: `I98c0708e521c0122beb53869242a3a1df8db3f3d`	2020-12-04 08:31:24 +01:00
Thomas Draebing	82d9ead576	Convert caches dashboard to grafonnet Change-Id: `I42f10428bb5f85991cef2abbcdfab9424b8bb48d`	2020-12-04 08:31:23 +01:00
Thomas Draebing	72391ac5e5	Convert queues dashboard to grafonnet Change-Id: `Ia3307a923b99ecacaaa8c803aa2af0c9bf4eabcb`	2020-12-04 08:31:22 +01:00
Thomas Draebing	ce5b8300f1	Start using Grafonnet to create Grafana dashboards Versioning the pure JSON files representing the Grafana dashboards had some disadvantages. It was hard to review them, they were very cluttered and a lot was duplicated. There are some tools that deal with that. One of them is Grafonnet, which is a superset of Jsonnet, a tool to create JSON files using a domain specific language. This change implements the Gerrit Process dashboard in Grafonnet. It also extends the installer to be able to install dashboards in the Jsonnet format. Change-Id: `I6235fb7d045bd71557678a4e3b0d4ad4515f4615`	2020-12-04 08:31:21 +01:00
Thomas Draebing	baa386bd98	Update Prometheus chart to 12.0.0. This also changes the helm chart repository, since the old one was deprecated. Further, the new version adapts the resources to not contain deprecated APIs. Change-Id: `Idd3f1ed48e22da303fd62d9c2ee63ccb959ed948`	2020-12-01 07:14:29 +00:00
Thomas Draebing	f9867a49ef	Update helm chart stable repository URL The stable repository for helm charts was moved to a new URL. The old one will be unavailable soon. Change-Id: `I34300992764bab012e8dd602d75f19817dcdd7ba`	2020-11-27 10:40:11 +01:00
Thomas Draebing	bec7bf7897	Adapt dashboards to be accepted by Grafana dashboard repository Grafana provides a repository for dashboards that can be used to easily import dashboards. Providing these dashboards there would make it easier for users not using the full setup provided here to still use the dashboards. To be able to upload however, the datasource reference in the dashboards has to be a template. This is however not compatible with the way how the dashboards are imported in the Grafana of the stack provided here. Thus, the variables are removed during the installation. Change-Id: `I99f127882a6f7594ca1c40fbe1e299378e89f4e9`	2020-11-27 10:40:09 +01:00
Thomas Draebing	65582f2deb	Also monitor parallel GC This change - adds metrics for parallel GC to the GC panel in the Gerrit Process dashboard - configures the GC panel to only show queries with values other than null - changes the interval to one minute, which fits the scrape interval - changes the default time frame to the last 24h, which is used for most other dashboards Change-Id: `I3b6587e769ae7486a02e26b8d7f2822319eb94e6`	2020-08-25 13:20:11 +02:00
Thomas Draebing	f5c4885e67	Remove basic auth between promtail chart and loki The promtail chart is anyway configured to use the Loki service for pushing logs. The service itself is not password protected and this was thus not required. Change-Id: `I886b76ca7e5d6e8af370a2cd0f527892008c7600`	2020-08-19 13:28:44 +02:00
Thomas Dräbing	50c3a5aac8	Merge changes I574c3b05,I95020080,I894e47f3,I86c5c547 * changes: Adapt to ytt 0.28.0 Sort monitoring and logging components into sub-maps in the config Collect logs from Gerrit in Kubernetes Add promtail chart to collect logs from cluster	2020-06-30 12:51:50 +00:00
Thomas Draebing	ad0b8c71ee	Add alert on Gerrit threads in deadlock This adds an alert that is firing, if 1 or more threads of a Gerrit instance are in a deadlock. Change-Id: `Ie2e14e81381e07de2559b42b91d6e483639831ef`	2020-06-25 09:00:06 +02:00
Thomas Draebing	89ee46a081	Adapt to ytt 0.28.0 Ytt 0.28.0 introduced a breaking change. The --output-directory option was removed. This was done because this option implicitly emptied the directory, which could be dangerous. While this option still exist under a different name, the --output-files option is now recommended. The installer now uses the --output-files option, but to ensure a clean installation, it checks, whether the directory already exists and if it does, asks the user, whether it can empty it. If it is not allowed to do so, the installation will abort. Change-Id: `I574c3b054e9293c0534d609c062946cd39890793`	2020-06-19 17:40:09 +02:00
Thomas Draebing	3b4005a047	Sort monitoring and logging components into sub-maps in the config This is done in preparation to allow multiple logging stacks. Change-Id: `I950200805ec01851bfdf6ccc3a5243893a947616`	2020-05-27 16:30:33 +02:00
Thomas Draebing	3887f2b53c	Collect logs from Gerrit in Kubernetes This adds a service discovery configuration for promtail to also collect logs for Gerrit installations in Kubernetes. The installations will be discovered by namespace and a given label. Change-Id: `I894e47f37428add9b44df6596950d314ee2a3ed0`	2020-05-27 16:30:33 +02:00
Thomas Draebing	de8fee4f68	Add promtail chart to collect logs from cluster This adds the promtail chart to the installation that allows to collect the logs of the applications in the cluster, which are written to stdout of the containers. This will only collect logs from pods in the same namespace as the monitoring setup. In a later change also logs from Gerrit instances in Kubernetes will be added. Change-Id: `I86c5c5470eaa31191fb5ac339ee21dee85106097`	2020-05-27 16:30:31 +02:00
Thomas Draebing	aab93a806b	Fix error if output directory didn't exist Change-Id: `Ib1fecac1433bf20d4c6c45a4f13b17ee8c864e73`	2020-05-26 14:29:26 +02:00
Thomas Draebing	451882b7e9	Allow to monitor Gerrit on Kubernetes So far it was only possible to monitor single instance Gerrit servers. This was due to to the fact that a URL had to be used that pointed to a dedicated instance, since if multiple replicas would be behind the instance, the metrics of a random replica would be scraped and not of all. Prometheus has a service discovery functionality for deployments running in Kubernetes. This is now used, when monitoring a Gerrit instance in Kubernetes. This allows to have a variable number of replicas running, which will be automatically discovered by Prometheus. The dashboards were adapted accordingly and allow now to select the replica to be observed. For now, no summary of all replicas can be displayed in the dashboards, but that feature is planned to be added in the future. Change-Id: `I96efc63a192cd90f5e3e91a53dace8e1ae83132e`	2020-05-14 15:55:35 +02:00
Thomas Draebing	7663baf7be	Use gerrit_build_info metric to display Gerrit version This replaces the hacky graph showing the Gerrit version with a table showing the current Gerrit version information. Change-Id: `Idfbdc85e376953aead40fea06544e5c84fb777e7`	2020-05-14 15:33:14 +02:00
Matthias Sohn	e8b2651af2	Add latency dashboard Add graphs for the following latency metrics - receive-commit - query total - query changes - REST total - REST change list comments - REST change list robot comments - REST change post review - REST get change detail - REST get change diff - REST get change - REST get commit - REST get change revision actions Change-Id: `Id782e12335ae76820cac4e4e8c80450671bf8216`	2020-05-05 18:30:18 +02:00
Thomas Draebing	dc60bd1654	Fix installation if TLS verification is skipped The installation failed, if TLS verification was disabled and no CA certificate was given in the configuration. This happened because the installation script always expected the CA certificate. The installation now only expects the certificate, if TLS verification is enabled. Change-Id: `I5429fc1ee0d230c74cc0689607cf2736d6520030`	2020-04-29 17:36:08 +02:00
Thomas Draebing	d0b53a0970	Create CA-certificate file for promtail during installation For TLS-verification promtail requires a CA-certificate, which had to be created manually. Change-Id: `Ia1fe191bad7f3d1ca4a1568921ad67d22c47efd7`	2020-04-16 14:25:53 +02:00
Thomas Draebing	2ead0f0a05	Version promtail version This adds the promtail version used in the setup to a file and adds an installation step downloading promtail, if the installation is not run in `dryrun`-mode. Change-Id: `I1127220a57b2610b5c4458ce2205077706a860e6`	2020-04-16 14:25:53 +02:00
Thomas Draebing	0bdb1d02e0	Create promtail config per Gerrit host So far the install-script could only create a single promtail config. Since the monitoring setup is able to monitor multiple Gerrit servers, this caused manual work to create a promtail config per Gerrit server. Now ytt will create a configuration for each Gerrit host configured in the config.yaml. Ytt is only able to do that in a single file. Thus, csplit is used to split the files into separate files that can then be used to configure promtail on the respective hosts. The config- files can then be found under $OUTPUT/promtail/promtail-$GERRIT_HOSTNAME.yaml. Change-Id: `Ib09fba83d8a8fbd45b42e9e5388a85a37ab1a952`	2020-04-16 14:25:53 +02:00
Thomas Draebing	6b75c12831	Rewrite the scripts in python The scripts were written in bash. Using bash became quite unwieldy. Python by nature can deal well with yaml and is thus better suited in dealing with the yaml-based configuration files. This change rewrites the original scripts staying as close as possible to the original ones. Right now, the python scripts call subprocesses a lot to work with the tools, which were already used before. At least for yaml- templating there may be better tools that have a python integration, which could be used in the future. Change-Id: `Ida16318445a05dcfdada9c7a56a391e4827f02e7`	2020-04-16 14:25:50 +02:00
Thomas Draebing	3f8594c3cb	Fix typo in install.sh script Change-Id: `Ib4529df6924d80032a24387db26719a8105b5496`	2020-04-15 14:03:45 +02:00
Thomas Dräbing	81ab4f166a	Merge changes I1ba3967a,Id55095c3 * changes: Describe infrastructure dependencies Use object store to store chunks created by Loki	2020-04-08 13:18:16 +00:00

1 2

75 commits