Skip to content

Commit fa7e1cc

Browse files
committed
Refactor docs
Signed-off-by: Matej Feder <[email protected]>
1 parent 9d02adf commit fa7e1cc

34 files changed

+1347
-315
lines changed

README.md

Lines changed: 8 additions & 314 deletions
Large diffs are not rendered by default.

chart/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515

1616
apiVersion: v2
1717
name: dnation-kubernetes-monitoring-stack
18-
version: 3.6.0
18+
version: 3.6.1
1919
appVersion: 2.7.0 # dnation-kubernetes-monitoring
2020
description: An umbrella helm chart for Kubernetes monitoring based on kube-prometheus-stack, thanos, loki, promtail and dnation-kubernetes-monitoring
2121
keywords:

docs/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Documentation Index
2+
3+
## Quick start
4+
5+
- [Getting started](quickstart.md)
6+
7+
## Topics
8+
9+
- [Multi cluster](multicluster.md)
10+
- [OpenShift](openshift.md)
11+
- [IaaS](iaas.md)
12+
- [K3s](k3s.md)
13+
- [Loki](loki.md)
14+
- [OAUTH](oauth.md)
15+
- [Alerts2Matrix](alertmanager2matrix.md)
16+
- [Blackbox exporter](blackbox_exporter.md)
17+
- [SSL exporter](ssl_exporter.md)
18+
- [Thanos tracing](tracing.md)
19+
- [Thanos tuning](tuning.md)
20+
21+
## Development
22+
- [Development guide](../helpers/README.md)

docs/alertmanager2matrix.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Alertmanager notifications to the Matrix chat
2+
3+
This page contains instructions on how to enable the Alertmanager to Matrix chat notifications in the Monitoring solution.
4+
5+
Project https://github.com/metio/matrix-alertmanager-receiver is used for forwarding alerts to a Matrix room.
6+
7+
To use it, fill your matrix credentials in `helpers/matrix-alertmanager/matrix-alertmanager-receiver.yaml` ConfigMap and deploy it:
8+
```bash
9+
kubectl apply -f helpers/matrix-alertmanager/matrix-alertmanager-receiver.yaml
10+
```
11+
12+
You can modify other settings according to the mentioned project [docs](https://github.com/metio/matrix-alertmanager-receiver)
13+
in the ConfigMap.
14+
15+
Adjust and incorporate configuration snippet below into the monitoring helm values:
16+
```yaml
17+
kube-prometheus-stack:
18+
alertmanager:
19+
config:
20+
route:
21+
receiver: 'matrix-notifications'
22+
group_by: ['alertname', 'job', 'severity']
23+
repeat_interval: 24h
24+
routes:
25+
- receiver: 'null'
26+
match:
27+
alertname: Watchdog
28+
receivers:
29+
- name: 'null'
30+
- name: 'matrix-notifications'
31+
webhook_configs:
32+
- url: "http://matrix-alertmanager-receiver:3000/alerts/alert-room"
33+
```

docs/blackbox_exporter.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Prometheus Blackbox Exporter
2+
3+
Our monitoring stack contains a helm chart for [prometheus-blackbox-exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-blackbox-exporter) as an optional component.
4+
5+
## Configuration
6+
7+
Enable prometheus-blackbox-exporter by adding `--set prometheus-blackbox-exporter.enabled=true` flag to the `helm` command, or enable it in values file.
8+
You can further configure prometheus-blackbox-exporter with values file, e.g.:
9+
10+
```yaml
11+
prometheus-blackbox-exporter:
12+
enabled: true
13+
serviceMonitor:
14+
targets:
15+
- name: dnation-cloud
16+
url: https://dnation.cloud/
17+
# enable also dashboards
18+
dnation-kubernetes-monitoring:
19+
blackboxMonitoring:
20+
enabled: true
21+
```

docs/iaas.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# IaaS monitoring
2+
3+
This component is marked as **experimental**.
4+
5+
IaaS monitoring module currently integrates and is able to observe the following targets:
6+
- [OpenStack](#openstack)
7+
- [Ceph](#ceph)
8+
9+
## Prerequisites
10+
11+
To test the Monitoring of the IaaS layer we expect running Kubernetes cluster that already contains
12+
Kubernetes monitoring platform.
13+
14+
### Local environment use case - KinD/K3s cluster deployed locally
15+
16+
#### KinD
17+
18+
Install the Kubernetes monitoring solution into the KinD Kubernetes cluster following the instructions provided in
19+
the [quickstart guide](quickstart.md).
20+
21+
#### K3s
22+
23+
Install the Kubernetes monitoring solution into the K3s Kubernetes cluster following the instructions provided in
24+
the [k3s guide](k3s.md).
25+
26+
## Deploy IaaS monitoring components
27+
28+
### OpenStack
29+
30+
#### Prometheus metrics and alerts
31+
32+
The [OpenStack exporter for Prometheus](https://github.com/openstack-exporter) could be deployed using the [openstack-exporter-helm-chart](https://github.com/SovereignCloudStack/openstack-exporter-helm-charts).
33+
This exporter contains a bunch of [Prometheus alerts and rules](https://github.com/SovereignCloudStack/openstack-exporter-helm-charts/blob/master/charts/prometheus-openstack-exporter/templates/prometheusrule.yaml)
34+
that are deployed together with the exporter.
35+
Visit the `helpers/iaas/openstack-exporter-values.yaml` file to validate the Helm configuration options.
36+
Ensure valid OpenStack API credentials are set under the `clouds_yaml_config` section. This **MUST** be overridden!
37+
38+
```bash
39+
helm upgrade --install prometheus-openstack-exporter oci://registry.scs.community/openstack-exporter/prometheus-openstack-exporter \
40+
--version 0.4.5 \
41+
-f helpers/iaas/openstack-exporter-values.yaml # --set "endpoint_type=public" --set "serviceMonitor.scrapeTimeout=1m"
42+
```
43+
44+
Tip: If you want to test the exporter basic functionality with **public** OpenStack API, configure `endpoint_type`
45+
to `public` (`--set "endpoint_type=public"`). Note that configuring `endpoint_type` as `public` will result in
46+
incomplete functionality for the Grafana dashboard.
47+
48+
Tip: Requesting and collecting metrics from the OpenStack API can be time-consuming, especially if the API is not
49+
performing well. In such cases, you may observe timeouts on the Prometheus server when it tries to fetch OpenStack
50+
metrics. To mitigate this, consider increasing the scrape interval to e.g. 1 minute (`--set "serviceMonitor.scrapeTimeout=1m"`).
51+
52+
#### Grafana dashboards
53+
54+
The Grafana dashboard designed to visualize metrics collected from an OpenStack cloud through the OpenStack exporter
55+
is publicly available at https://grafana.com/grafana/dashboards/21085. Its source code is located [here](https://github.com/SovereignCloudStack/k8s-observability/tree/main/iaas/dashboards).
56+
Feel free to import it to the Grafana via its source or ID.
57+
For automatic integration into the Kubernetes monitoring solution proceed to the next step.
58+
59+
#### Update the Kubernetes monitoring deployment
60+
61+
This step deploys the Grafana dashboards and instructs the monitoring stack to add the OpenStack exporter target into the Prometheus configuration:
62+
63+
```bash
64+
helm upgrade kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values -f helpers/iaas/values-observer-iaas.yaml
65+
```
66+
67+
#### Access the OpenStack dashboard
68+
69+
At this point, you should have the ability to access the Grafana UI, and OpenStack dashboard.
70+
Log in to the Grafana UI and find the OpenStack dashboard in IaaS directory.
71+
72+
### Ceph
73+
74+
This guide covers Ceph monitoring for Ceph clusters deployment by [ceph-ansible](https://github.com/ceph/ceph-ansible) and [rook operator](https://github.com/rook/rook).
75+
While both expose the same metrics via the same endpoints, there are some differences in Prometheus configuration and alerts.
76+
77+
#### Prometheus metrics and alerts
78+
79+
Ceph contains 2 build-in sources of metrics a.k.a. exporters.
80+
The Ceph exporter (introduced in Reef release of Ceph) is the main source of Ceph performance metrics. It runs as a
81+
dedicated daemon. This daemon runs on every Ceph cluster host and exposes a metrics end point where all the performance
82+
counters exposed by all the Ceph daemons running in the host are published in the form of Prometheus metrics.
83+
84+
The second source of metrics is the Prometheus manager module. It exposes metrics related to the whole cluster,
85+
basically metrics that are not produced by individual Ceph daemons.
86+
87+
Read the related Ceph [docs](https://docs.ceph.com/en/reef/monitoring/#ceph-metrics).
88+
Since these exporters are integrated with Ceph, deploying a third-party Ceph exporter is unnecessary.
89+
90+
**Prometheus alerts**
91+
92+
Both Ceph deployment strategies use the ceph-mixins project as a source of alerts. The ceph-ansible and rook projects
93+
each maintain a rendered version of these alerts, but the rook repository contains some differences, primarily because
94+
rook does not use the cephadm tool as a backend.
95+
Therefore, find and apply one of the following commands to create a custom observer rules values file for either the
96+
ceph-ansible or ceph-rook deployment ([yq](https://github.com/mikefarah/yq/#install) tool required):
97+
98+
```bash
99+
# ceph-ansible
100+
curl -s https://raw.githubusercontent.com/ceph/ceph/main/monitoring/ceph-mixin/prometheus_alerts.yml | \
101+
yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-ansible-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > helpers/iaas/values-observer-ceph-rules.yaml
102+
103+
# rook
104+
curl -s https://raw.githubusercontent.com/rook/rook/master/deploy/charts/rook-ceph-cluster/prometheus/localrules.yaml | \
105+
yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-rook-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > helpers/iaas/values-observer-ceph-rules.yaml
106+
```
107+
108+
#### Grafana dashboards
109+
110+
We've tested and could recommend 2 sources of Grafana dashboards that are suitable for both Ceph deployment strategies (ansible and rook):
111+
- [dashboards linked in rook docs](https://rook.io/docs/rook/latest-release/Storage-Configuration/Monitoring/ceph-monitoring/?h=gra#grafana-dashboards)
112+
- [ceph-mixins dashboards](https://github.com/ceph/ceph-mixins/tree/master/dashboards)
113+
- Built version of ceph-mixins dashboards could be found e.g. [here](https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin/dashboards_out)
114+
115+
We consider the dashboards created within the Rook project as a solid starting point for Ceph metrics visualization.
116+
If you want to see more detailed dashboards, uncomment and use the ceph-mixin dashboards in the `helpers/iaas/values-observer-ceph-rook.yaml`
117+
or `helpers/iaas/values-observer-ceph-ansible.yaml` file. You can use both.
118+
119+
#### Update the Kubernetes monitoring deployment
120+
121+
This step deploys Grafana dashboards, Prometheus rules and instruct monitoring stack to add the Ceph exporter targets into the Prometheus configuration.
122+
Ensure that you add the monitoring targets' IPs and ports to `helpers/iaas/values-observer-ceph-ansible.yaml` for Ceph-ansible deployment.
123+
124+
```bash
125+
helm upgrade kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values \
126+
-f helpers/iaas/values-observer-ceph-rules.yaml \
127+
-f helpers/iaas/values-observer-ceph-[rook|ansible].yaml # use values file for either the ceph-ansible or ceph-rook deployment
128+
```

docs/images/jaeger.png

148 KB
Loading
File renamed without changes.

docs/k3s.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# K3s support
2+
3+
K3s is a certified Kubernetes distribution optimized for production environments, particularly in remote locations
4+
or resource-constrained environments.
5+
6+
This page contains information on how to develop and/or test the Kubernetes solution as a monitoring solution for a k3s
7+
cluster. It guides the user to create an HA k3s cluster via k3d (a wrapper to run k3s in Docker) and bootstrap
8+
it with the Kubernetes monitoring solution.
9+
10+
Note that the following tutorial guides you to deploy an HA K3s cluster consisting of 3 control plane nodes (servers)
11+
and one worker node (agent). The reason is that the HA K3s cluster utilizes an embedded etcd cluster as cluster storage
12+
(refer to https://docs.k3s.io/datastore/ha-embedded).
13+
Using a single-node K3s cluster that uses the SQLite database (default) requires additional tweaks of monitoring values,
14+
which are not covered in this guide.
15+
16+
## Prerequisites
17+
18+
- [K3d](https://k3d.io/#installation)
19+
- [helm](https://helm.sh/)
20+
- [kubectl](https://kubernetes.io/docs/reference/kubectl/)
21+
22+
## Prepare K3s Kubernetes cluster via K3d
23+
24+
```bash
25+
k3d cluster create --config helpers/k3s/k3s-config.yaml --image rancher/k3s:v1.28.8-k3s1 observer
26+
```
27+
28+
If you opt not to use K3D with the custom config we provided here, and prefer utilizing another k3s cluster,
29+
ensure that the metric endpoints for various control plane components are properly exposed.
30+
Refer to the [docs](https://dnationcloud.github.io/kubernetes-monitoring/helpers/FAQ/#kubernetes-monitoring-shows-or-0-state-for-some-control-plane-components-are-control-plane-components-working-correctly).
31+
32+
## Deploy Observer monitoring solution
33+
34+
K3s consolidates all Kubernetes control plane components into a single process, which means that the metrics for these
35+
control plane components are exposed on the K3d hosts rather than through individual Kubernetes Services/PODs.
36+
To customize monitoring values for K3s, refer to the specific custom HELM values file `helpers/k3s/values-observer-k3s.yaml`.
37+
This file contains the necessary configurations and adjustments needed to monitor K3s.
38+
Note that list of control plane node IPs (endpoints) should be overridden.
39+
40+
Get and store the K3d control plane node IPs:
41+
```bash
42+
NODE_IPS=$(kubectl get nodes -l node-role.kubernetes.io/control-plane=true -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}' | tr ' ' ',' | sed 's/^/{&/;s/$/}/')
43+
```
44+
45+
Install the monitoring stack and set the control plane component endpoints
46+
```bash
47+
helm repo add dnationcloud https://dnationcloud.github.io/helm-hub/
48+
helm repo update dnationcloud
49+
helm upgrade --install kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack -f helpers/k3s/values-observer-k3s.yaml \
50+
--set "kube-prometheus-stack.kubeEtcd.endpoints=$NODE_IPS" \
51+
--set "kube-prometheus-stack.kubeProxy.endpoints=$NODE_IPS" \
52+
--set "kube-prometheus-stack.kubeControllerManager.endpoints=$NODE_IPS" \
53+
--set "kube-prometheus-stack.kubeScheduler.endpoints=$NODE_IPS"
54+
```
55+
56+
# Access the Observer monitoring UIs
57+
58+
At this point, you should have the ability to access the Grafana, Alertmanager and Prometheus UIs
59+
within the Observer monitoring cluster.
60+
61+
- Grafana UI
62+
```bash
63+
http://localhost:30000
64+
```
65+
- Use the following credentials:
66+
- username: `admin`
67+
- password: `pass`
68+
69+
- Visit the Layer 0 dashboard, `infrastructure-services-monitoring`, and drill down to explore cluster metrics
70+
- http://localhost:30000/d/monitoring/infrastructure-services-monitoring
71+
72+
- Alertmanager UI
73+
```bash
74+
http://localhost:30001
75+
```
76+
77+
- Prometheus UI
78+
```bash
79+
http://localhost:30002
80+
```

docs/loki.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Loki
2+
3+
## loki-distributed
4+
5+
This chart is deprecated and replaced by [loki](https://github.com/grafana/loki/tree/main/production/helm/loki) helm chart.
6+
Find the deprecated values in `helpers/loki/values-loki-distributed.yaml`
7+
8+
Loki helm chart is the only helm chart you should use for loki helm deployment. It supports loki deployment in monolithic, scalable
9+
and even [distributed mode](https://grafana.com/docs/loki/next/setup/install/helm/install-microservices/).
10+
11+
We recommend use the loki helm chart for all fresh installations. If you already use loki-distributed helm chart, check
12+
the migration [guide](https://grafana.com/docs/loki/latest/setup/migrate/migrate-from-distributed/).

0 commit comments

Comments
 (0)