c7c17679e9
There are a lot of latency metrics. This change splits up the existing
dashboard for latencies. For REST API latencies, it also allows to
select the REST API calls to look at. This change also adds latency
dashboards for the NoteDB and UI Actions.
Change-Id:
|
||
---|---|---|
.github | ||
cfgmgr | ||
charts | ||
dashboards | ||
documentation | ||
promtail | ||
subcommands | ||
.gitignore | ||
.pylintrc | ||
config.yaml | ||
gerrit_monitoring.py | ||
jsonnetfile.json | ||
jsonnetfile.lock.json | ||
LICENSE | ||
Pipfile | ||
Pipfile.lock | ||
README.md |
Monitoring setup for Gerrit
This project provides a setup for monitoring Gerrit instances. The setup is based on Prometheus and Grafana running in Kubernetes. In addition, logging will be provided by Grafana Loki.
The setup is provided as a helm chart. It can be installed using Helm (This README expects Helm version 3.0 or higher).
The charts used in this setup are the chart provided in the open source and can be found on GitHub:
This project just provides values.yaml
-files that are already configured to
work with the metrics-reporter-prometheus
-plugin of Gerrit to make the setup
easier.
Dependencies
Software
-
Gerrit
Gerrit requires the following plugin to be installed: -
Promtail
Promtail has to be installed with access to thelogs
-directory in the Gerrit- site. A configuration-file for Promtail will be provided in this setup. Find the documentation for Promtail here -
Helm
To install and configure Helm, follow the official guide. -
ytt
ytt is a templating tool for yaml-files. It is required for some last moment configuration. Installation instructions can be found here. -
Pipenv
Pipenv sets up a virtual python environment and installs required python packages based on a lock-file, ensuring a deterministic Python environment. Instruction on how Pipenv can be installed, can be found here -
Jsonnet
Jsonnet is used to create the JSON-files describing the Grafana dashboards. Instruction on how Jsonnet can be installed, can be found here -
Grafonnet
Grafonnet should be installed using jsonnet-bundler and thejsonnetfile.json
provided by this project. Install jsonnet-bundler as described here. Then runjb install
from this project's root directory.
Infrastructure
-
Kubernetes Cluster
A cluster with at least 3 free CPUs and 4 GB of free memory are required. In addition persistent storage of about 30 GB will be used. -
Ingress Controller
The charts currently expect a Nginx ingress controller to be installed in the cluster. -
Object store
Loki will store the data chunks in an object store. This store has to be callable via the S3 API.
Add dashboards
There are two ways to have dashboards deployed automatically during installation:
Using JSON
One way is to export the dashboards to a JSON-file in the UI or create JSON-files
describing the dashboards in another way. Put these dashboards into the
./dashboards
-directory of this repository.
Using Jsonnet + Grafonnet
The other way is to use Jsonnet/Grafonnet to programmatically create dashboards. Install Grafonnet into the project as described above and put your dashboard jsonnet files into the dashboards-directory or one of its subdirectories. The jsonnet-based dashboards can be transcribed into json manually using the following command:
jsonnet -J grafonnet-lib --ext-code publish=false dashboards/<dashboard>.jsonnet
The external variable publish
should be set to false
, if the dashboard is
imported via API and to true
, if it is published to the Grafana homepage or
imported via the UI.
Configuration
While this project is supposed to provide a specialized and opinionated monitoring
setup, some configuration is highly dependent on the specific installation.
These options have to be configured in the ./config.yaml
before installing and
are listed here:
option | description |
---|---|
gerritServers |
List of Gerrit servers to scrape. For details refer to section below |
namespace |
The namespace the charts are installed to |
tls.skipVerify |
Whether to skip TLS certificate verification |
tls.caCert |
CA certificate used for TLS certificate verification |
monitoring.prometheus.server.host |
Prometheus server ingress hostname |
monitoring.prometheus.server.username |
Username for Prometheus |
monitoring.prometheus.server.password |
Password for Prometheus |
monitoring.prometheus.server.tls.cert |
TLS certificate |
monitoring.prometheus.server.tls.key |
TLS key |
monitoring.prometheus.alertmanager.slack.apiUrl |
API URL of the Slack Webhook |
monitoring.prometheus.alertmanager.slack.channel |
Channel to which the alerts should be posted |
monitoring.grafana.host |
Grafana ingress hostname |
monitoring.grafana.tls.cert |
TLS certificate |
monitoring.grafana.tls.key |
TLS key |
monitoring.grafana.admin.username |
Username for the admin user |
monitoring.grafana.admin.password |
Password for the admin user |
monitoring.grafana.ldap.enabled |
Whether to enable LDAP |
monitoring.grafana.ldap.host |
Hostname of LDAP server |
monitoring.grafana.ldap.port |
Port of LDAP server (Has to be quoted !) |
monitoring.grafana.ldap.password |
Password of LDAP server |
monitoring.grafana.ldap.bind_dn |
Bind DN (username) of the LDAP server |
monitoring.grafana.ldap.accountBases |
List of base DNs to discover accounts (Has to have the format "['a', 'b']" ) |
monitoring.grafana.ldap.groupBases |
List of base DNs to discover groups (Has to have the format "['a', 'b']" ) |
monitoring.grafana.dashboards.editable |
Whether dashboards can be edited manually in the UI |
logging.loki.host |
Loki ingress hostname |
logging.loki.username |
Username for Loki |
logging.loki.password |
Password for Loki |
logging.loki.s3.protocol |
Protocol used for communicating with S3 |
logging.loki.s3.host |
Hostname of the S3 object store |
logging.loki.s3.accessToken |
The EC2 accessToken used for authentication with S3 |
logging.loki.s3.secret |
The secret associated with the accessToken |
logging.loki.s3.bucket |
The name of the S3 bucket |
logging.loki.s3.region |
The region in which the S3 bucket is hosted |
logging.loki.tls.cert |
TLS certificate |
logging.loki.tls.key |
TLS key |
gerritServers
Two types of Gerrit servers are currently supported, which require different configuration parameters:
- Kubernetes
Gerrit installations running in the same Kubernetes cluster as the monitoring setup. Multiple replicas are supported and automatically discovered.
option | description |
---|---|
gerritServers.kubernetes.[*].namespace |
Namespace into which Gerrit was deployed |
gerritServers.kubernetes.[*].label.name |
Label name used to select deployments |
gerritServers.kubernetes.[*].label.value |
Label value to select deployments |
gerritServers.kubernetes.[*].containerName |
Name of container in the pod that runs Gerrit |
gerritServers.kubernetes.[*].port |
Container port to be used when scraping |
gerritServers.kubernetes.[*].username |
Username of Gerrit user with 'View Metrics' capabilities |
gerritServers.kubernetes.[*].password |
Password of Gerrit user with 'View Metrics' capabilities |
- Other
Gerrit installations with just one replica that can run anywhere, where they are reachable via HTTP.
option | description |
---|---|
gerritServers.other.[*].host |
Hostname (incl. port, if required) of the Gerrit server to monitor |
gerritServers.other.[*].username |
Username of Gerrit user with 'View Metrics' capabilities |
gerritServers.other.[*].password |
Password of Gerrit user with 'View Metrics' capabilities |
gerritServers.other.[*].promtail.storagePath |
Path to directory, where Promtail is allowed to save files (e.g. positions.yaml ) |
gerritServers.other.[*].promtail.logPath |
Path to directory containing the Gerrit logs (e.g. /var/gerrit/logs ) |
Encryption
The configuration file contains secrets. Thus, to be able to share the configuration, e.g. with the CI-system, it is meant to be encrypted. The encryption is explained here.
The gerrit-monitoring.py install
-command will decrypt the file before templating,
if it was encrypted with sops
.
Installation
Before using the script, set up a python environment using pipenv install
.
The installation will use the environment of the current shell. Thus, make sure
that the path for ytt
, kubectl
and helm
are set. Also the KUBECONFIG
-variable
has to be set to point to the kubeconfig of the target Kubernetes cluster.
This project provides a script to quickly install the monitoring setup. To use it, run:
pipenv run python ./gerrit-monitoring.py \
--config config.yaml \
install \
[--output ./dist] \
[--dryrun] \
[--update-repo]
The command will use the given configuration (--config
/-c
) to create the
final files in the directory given by --output
/-o
(default ./dist
) and
install/update the Kubernetes resources and charts, if the --dryrun
/-d
flag
is not set. If the --update-repo
-flag is used, the helm repository will be updated
before installing the helm charts. This is for example required, if a chart version
was updated.
Configure Promtail
Promtail has to be installed with access to the directory containing the Gerrit
logs, e.g. on the same host. The installation as described above will create a
configuration file for Promtail, which can be found in ./dist/promtail.yaml
.
Use it to configure Promtail by using the -config.file=./dist/promtail.yaml
-
parameter, when starting Promtail. Using the Promtail binary directly this would
result in the following command:
$PATH_TO_PROMTAIL/promtail \
-config.file=./dist/promtail.yaml
If TLS-verification is activated, the CA-certificate used for verification
(usually the one configured for tls.caCert
) has to be present in the
directory configured for promtail.storagePath
in the config.yaml
and has to
be called promtail.ca.crt
.
The Promtail configuration provided here expects the logs to be available in
JSON-format. This can be configured by setting log.jsonLogging = true
in the
gerrit.config
.
Uninstallation
To remove the Prometheus chart from the cluster, run
helm uninstall prometheus --namespace $NAMESPACE
helm uninstall loki --namespace $NAMESPACE
helm uninstall grafana --namespace $NAMESPACE
kubectl delete -f ./dist/configuration
To also release the volumes, run
kubectl delete -f ./dist/storage
NOTE: Doing so, all data, which was not backed up will be lost!
Remove the namespace:
kubectl delete -f ./dist/namespace.yaml
The ./gerrit-monitoring.py uninstall
-script will automatically remove the
charts installed in the configured namespace and delete the namespace as well:
pipenv run python ./gerrit-monitoring.py \
--config config.yaml \
uninstall