This updates the Grafana chart to the new repository, since the old
repository is now deprecated. This also updates the container images
and Grafana version.
Change-Id: I29e38d7c23bfa95992537efae7b8b3967d71ffd0
This also changes the helm chart repository, since the old one was
deprecated. Further, the new version adapts the resources to not
contain deprecated APIs.
Change-Id: Idd3f1ed48e22da303fd62d9c2ee63ccb959ed948
The promtail chart is anyway configured to use the Loki service for
pushing logs. The service itself is not password protected and this
was thus not required.
Change-Id: I886b76ca7e5d6e8af370a2cd0f527892008c7600
* changes:
Adapt to ytt 0.28.0
Sort monitoring and logging components into sub-maps in the config
Collect logs from Gerrit in Kubernetes
Add promtail chart to collect logs from cluster
This adds a service discovery configuration for promtail to also
collect logs for Gerrit installations in Kubernetes. The installations
will be discovered by namespace and a given label.
Change-Id: I894e47f37428add9b44df6596950d314ee2a3ed0
This adds the promtail chart to the installation that allows to
collect the logs of the applications in the cluster, which are written
to stdout of the containers.
This will only collect logs from pods in the same namespace as the
monitoring setup. In a later change also logs from Gerrit instances
in Kubernetes will be added.
Change-Id: I86c5c5470eaa31191fb5ac339ee21dee85106097
So far it was only possible to monitor single instance Gerrit servers.
This was due to to the fact that a URL had to be used that pointed to
a dedicated instance, since if multiple replicas would be behind the
instance, the metrics of a random replica would be scraped and not of
all.
Prometheus has a service discovery functionality for deployments running
in Kubernetes. This is now used, when monitoring a Gerrit instance in
Kubernetes. This allows to have a variable number of replicas running,
which will be automatically discovered by Prometheus.
The dashboards were adapted accordingly and allow now to select the
replica to be observed. For now, no summary of all replicas can be
displayed in the dashboards, but that feature is planned to be added
in the future.
Change-Id: I96efc63a192cd90f5e3e91a53dace8e1ae83132e
* changes:
Relabel the instance label for prometheus and loki metrics
Add dashboard for Loki metrics
Add dashboard to monitor Prometheus data
Only show Gerrit instances in the instance dropdowns
Create a configmap per dashboard
The instance label for Prometheus had the value localhost:9090, which
was misleading.
Now the label is relabeled to prometheus-<namespace> or loki-<namespace>.
This is still not ideal for cases, where multiple replicas are deployed,
but until then, it is already a slight improvement.
Change-Id: I1efdc49071b1d3bf99d21315ca03821e9d58c906
I the dashboard files got too large (>2Mb) Kubernetes was rejecting
the configmap.
Now each dashboard is installed with an own configmap. A sidecar container
is used to register these dashboards with Grafana.
Change-Id: I84062d6e2ac7dc2669945b54575bf239a25900a4
The default maximum log lines shown in Grafana are 1000. This is
barely covering a few minutes in the httpd-logs.
The value of 10,000 can still be handled by the browser. More log
entries will cause the browser to cache as long as Grafana does not
provide pagination, which is planned for the future.
Change-Id: Ife84d161cd022300ff6f440920021e4176b770b9
The most interesting new features are:
- proper limits for queried logs
- query history for logs (still a beta feature)
Change-Id: Ibd8b76b0e1e16d4bd3c74382fa3fd5a24c1bba45
The chunks created by Loki were stored in a persistent volume. This
does not scale well, since volumes cannot easily be resized in
Kubernetes. Also, at least the ext4-filesystem had issues, when large
numbers of logs were saved. These issues are due to the dir_index as
discussed in [1].
An object store provides a more scalable and cheaper solution. Loki
supports S3 as an object storage and also other object stores that
understand the S3 API like Ceph or OpenStack Swift.
[1] https://github.com/grafana/loki/issues/1502
Change-Id: Id55095c3b6659f40708712c1a494753dbcab7686
This increases the time a chunk has to be filled before being flushed.
With shorter times, it could happen that during times of low traffic
chunks will not be filled completely before being flushed. This would
lead to small chunk objects, which is inefficient.
Change-Id: I74b2af1a053c8d4298b9e9d7ffca04cb9d8926bd
So far, there were no limits to the resources the Loki pod was allowed
to use. This now sets limits that in my observation for now seem to
work. With handling more and more logs, these limits will probably have
to be increased.
Change-Id: I7313488a60da8a1fff28666870549f748400735a
The default limit of requests accepted by Loki from a single host was
set to 10000, which is not enough for a large Gerrit instance to push
all httpd/sshd-logs to Loki.
Change-Id: I94cb56e00102170ae4ed10e90123a8885e3aad00
This change adds the current status of a project that aims to create
a simple monitoring setup to monitor Gerrit servers, which was developed
internally at SAP.
The project provides an opinionated and basic configuration for helm
charts that can be used to install Loki, Prometheus and Grafana on a
Kubernetes cluster. Scripts to easily apply the configuration and
install the whole setup are provided as well.
The contributions so far were done by (with number of commits)
80 Thomas Draebing
11 Matthias Sohn
2 Saša Živkov
Change-Id: I8045780446edfb3c0dc8287b8f494505e338e066