dNationCloud
diff --git a/‎README.md‎
Lines changed: 8 additions & 314 deletions b/‎README.md‎
Lines changed: 8 additions & 314 deletions
diff --git a/‎chart/Chart.yaml‎
Lines changed: 1 addition & 1 deletion b/‎chart/Chart.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/README.md‎
Lines changed: 22 additions & 0 deletions b/‎docs/README.md‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎docs/alertmanager2matrix.md‎
Lines changed: 33 additions & 0 deletions b/‎docs/alertmanager2matrix.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎docs/blackbox_exporter.md‎
Lines changed: 21 additions & 0 deletions b/‎docs/blackbox_exporter.md‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎docs/iaas.md‎
Lines changed: 128 additions & 0 deletions b/‎docs/iaas.md‎
Lines changed: 128 additions & 0 deletions
diff --git a/‎docs/images/jaeger.png‎
148 KB b/‎docs/images/jaeger.png‎
148 KB
diff --git a/‎thanos-deployment-architecture.svg‎ renamed to ‎docs/images/thanos-deployment-architecture.svg‎ b/‎thanos-deployment-architecture.svg‎ renamed to ‎docs/images/thanos-deployment-architecture.svg‎
diff --git a/‎docs/k3s.md‎
Lines changed: 80 additions & 0 deletions b/‎docs/k3s.md‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎docs/loki.md‎
Lines changed: 12 additions & 0 deletions b/‎docs/loki.md‎
Lines changed: 12 additions & 0 deletions
@@ -15,7 +15,7 @@
 
 apiVersion: v2
 name: dnation-kubernetes-monitoring-stack
-version: 3.6.0
+version: 3.6.1
 appVersion: 2.7.0  # dnation-kubernetes-monitoring
 description: An umbrella helm chart for Kubernetes monitoring based on kube-prometheus-stack, thanos, loki, promtail and dnation-kubernetes-monitoring
 keywords:
 
@@ -0,0 +1,22 @@
+# Documentation Index
+
+## Quick start
+
+- [Getting started](quickstart.md)
+
+## Topics
+
+- [Multi cluster](multicluster.md)
+- [OpenShift](openshift.md)
+- [IaaS](iaas.md)
+- [K3s](k3s.md)
+- [Loki](loki.md)
+- [OAUTH](oauth.md)
+- [Alerts2Matrix](alertmanager2matrix.md)
+- [Blackbox exporter](blackbox_exporter.md)
+- [SSL exporter](ssl_exporter.md)
+- [Thanos tracing](tracing.md)
+- [Thanos tuning](tuning.md)
+
+## Development
+- [Development guide](../helpers/README.md)
@@ -0,0 +1,33 @@
+# Alertmanager notifications to the Matrix chat
+
+This page contains instructions on how to enable the Alertmanager to Matrix chat notifications in the Monitoring solution. 
+
+Project https://github.com/metio/matrix-alertmanager-receiver is used for forwarding alerts to a Matrix room.
+
+To use it, fill your matrix credentials in `helpers/matrix-alertmanager/matrix-alertmanager-receiver.yaml` ConfigMap and deploy it:
+```bash
+kubectl apply -f helpers/matrix-alertmanager/matrix-alertmanager-receiver.yaml
+```
+
+You can modify other settings according to the mentioned project [docs](https://github.com/metio/matrix-alertmanager-receiver)
+in the ConfigMap.
+
+Adjust and incorporate configuration snippet below into the monitoring helm values:
+```yaml
+kube-prometheus-stack:
+  alertmanager:
+    config:
+      route:
+        receiver: 'matrix-notifications'
+        group_by: ['alertname', 'job', 'severity']
+        repeat_interval: 24h
+        routes:
+        - receiver: 'null'
+          match:
+            alertname: Watchdog
+      receivers:
+      - name: 'null'
+      - name: 'matrix-notifications'
+        webhook_configs:
+        - url: "http://matrix-alertmanager-receiver:3000/alerts/alert-room"
+```
@@ -0,0 +1,21 @@
+# Prometheus Blackbox Exporter
+
+Our monitoring stack contains a helm chart for [prometheus-blackbox-exporter](https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-blackbox-exporter) as an optional component.
+
+## Configuration
+
+Enable prometheus-blackbox-exporter by adding `--set prometheus-blackbox-exporter.enabled=true` flag to the `helm` command, or enable it in values file.
+You can further configure prometheus-blackbox-exporter with values file, e.g.:
+
+```yaml
+prometheus-blackbox-exporter:
+  enabled: true
+  serviceMonitor:
+    targets:
+    - name: dnation-cloud
+      url: https://dnation.cloud/
+# enable also dashboards
+dnation-kubernetes-monitoring:
+  blackboxMonitoring:
+    enabled: true
+```
@@ -0,0 +1,128 @@
+# IaaS monitoring
+
+This component is marked as **experimental**.
+
+IaaS monitoring module currently integrates and is able to observe the following targets:
+- [OpenStack](#openstack)
+- [Ceph](#ceph)
+
+## Prerequisites
+
+To test the Monitoring of the IaaS layer we expect running Kubernetes cluster that already contains
+Kubernetes monitoring platform.
+
+### Local environment use case - KinD/K3s cluster deployed locally
+
+#### KinD
+
+Install the Kubernetes monitoring solution into the KinD Kubernetes cluster following the instructions provided in
+the [quickstart guide](quickstart.md).
+
+#### K3s
+
+Install the Kubernetes monitoring solution into the K3s Kubernetes cluster following the instructions provided in
+the [k3s guide](k3s.md).
+
+## Deploy IaaS monitoring components
+
+### OpenStack
+
+#### Prometheus metrics and alerts
+
+The [OpenStack exporter for Prometheus](https://github.com/openstack-exporter) could be deployed using the [openstack-exporter-helm-chart](https://github.com/SovereignCloudStack/openstack-exporter-helm-charts).
+This exporter contains a bunch of [Prometheus alerts and rules](https://github.com/SovereignCloudStack/openstack-exporter-helm-charts/blob/master/charts/prometheus-openstack-exporter/templates/prometheusrule.yaml)
+that are deployed together with the exporter.
+Visit the `helpers/iaas/openstack-exporter-values.yaml` file to validate the Helm configuration options.
+Ensure valid OpenStack API credentials are set under the `clouds_yaml_config` section. This **MUST** be overridden!
+
+```bash
+helm upgrade --install prometheus-openstack-exporter oci://registry.scs.community/openstack-exporter/prometheus-openstack-exporter \
+  --version 0.4.5 \
+  -f helpers/iaas/openstack-exporter-values.yaml # --set "endpoint_type=public" --set "serviceMonitor.scrapeTimeout=1m"
+```
+
+Tip: If you want to test the exporter basic functionality with **public** OpenStack API, configure `endpoint_type`
+to `public` (`--set "endpoint_type=public"`). Note that configuring `endpoint_type` as `public` will result in
+incomplete functionality for the Grafana dashboard.
+
+Tip: Requesting and collecting metrics from the OpenStack API can be time-consuming, especially if the API is not
+performing well. In such cases, you may observe timeouts on the Prometheus server when it tries to fetch OpenStack
+metrics. To mitigate this, consider increasing the scrape interval to e.g. 1 minute (`--set "serviceMonitor.scrapeTimeout=1m"`).
+
+#### Grafana dashboards
+
+The Grafana dashboard designed to visualize metrics collected from an OpenStack cloud through the OpenStack exporter
+is publicly available at https://grafana.com/grafana/dashboards/21085. Its source code is located [here](https://github.com/SovereignCloudStack/k8s-observability/tree/main/iaas/dashboards).
+Feel free to import it to the Grafana via its source or ID.
+For automatic integration into the Kubernetes monitoring solution proceed to the next step.
+
+#### Update the Kubernetes monitoring deployment
+
+This step deploys the Grafana dashboards and instructs the monitoring stack to add the OpenStack exporter target into the Prometheus configuration:
+
+```bash
+helm upgrade kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values -f helpers/iaas/values-observer-iaas.yaml
+```
+
+#### Access the OpenStack dashboard
+
+At this point, you should have the ability to access the Grafana UI, and OpenStack dashboard.
+Log in to the Grafana UI and find the OpenStack dashboard in IaaS directory.
+
+### Ceph
+
+This guide covers Ceph monitoring for Ceph clusters deployment by [ceph-ansible](https://github.com/ceph/ceph-ansible) and [rook operator](https://github.com/rook/rook).
+While both expose the same metrics via the same endpoints, there are some differences in Prometheus configuration and alerts.
+
+#### Prometheus metrics and alerts
+
+Ceph contains 2 build-in sources of metrics a.k.a. exporters.
+The Ceph exporter (introduced in Reef release of Ceph) is the main source of Ceph performance metrics. It runs as a
+dedicated daemon. This daemon runs on every Ceph cluster host and exposes a metrics end point where all the performance
+counters exposed by all the Ceph daemons running in the host are published in the form of Prometheus metrics.
+
+The second source of metrics is the Prometheus manager module. It exposes metrics related to the whole cluster,
+basically metrics that are not produced by individual Ceph daemons.
+
+Read the related Ceph [docs](https://docs.ceph.com/en/reef/monitoring/#ceph-metrics).
+Since these exporters are integrated with Ceph, deploying a third-party Ceph exporter is unnecessary.
+
+**Prometheus alerts**
+
+Both Ceph deployment strategies use the ceph-mixins project as a source of alerts. The ceph-ansible and rook projects
+each maintain a rendered version of these alerts, but the rook repository contains some differences, primarily because
+rook does not use the cephadm tool as a backend. 
+Therefore, find and apply one of the following commands to create a custom observer rules values file for either the
+ceph-ansible or ceph-rook deployment ([yq](https://github.com/mikefarah/yq/#install) tool required):
+
+```bash
+# ceph-ansible
+curl -s https://raw.githubusercontent.com/ceph/ceph/main/monitoring/ceph-mixin/prometheus_alerts.yml | \
+  yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-ansible-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > helpers/iaas/values-observer-ceph-rules.yaml
+
+# rook
+curl -s https://raw.githubusercontent.com/rook/rook/master/deploy/charts/rook-ceph-cluster/prometheus/localrules.yaml | \
+  yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-rook-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > helpers/iaas/values-observer-ceph-rules.yaml
+```
+
+#### Grafana dashboards
+
+We've tested and could recommend 2 sources of Grafana dashboards that are suitable for both Ceph deployment strategies (ansible and rook):
+- [dashboards linked in rook docs](https://rook.io/docs/rook/latest-release/Storage-Configuration/Monitoring/ceph-monitoring/?h=gra#grafana-dashboards)
+- [ceph-mixins dashboards](https://github.com/ceph/ceph-mixins/tree/master/dashboards)
+  - Built version of ceph-mixins dashboards could be found e.g. [here](https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin/dashboards_out)
+
+We consider the dashboards created within the Rook project as a solid starting point for Ceph metrics visualization.
+If you want to see more detailed dashboards, uncomment and use the ceph-mixin dashboards in the `helpers/iaas/values-observer-ceph-rook.yaml`
+or `helpers/iaas/values-observer-ceph-ansible.yaml` file. You can use both.
+
+#### Update the Kubernetes monitoring deployment
+
+This step deploys Grafana dashboards, Prometheus rules and instruct monitoring stack to add the Ceph exporter targets into the Prometheus configuration.
+Ensure that you add the monitoring targets' IPs and ports to `helpers/iaas/values-observer-ceph-ansible.yaml` for Ceph-ansible deployment.
+
+```bash
+helm upgrade kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values \
+  -f helpers/iaas/values-observer-ceph-rules.yaml \
+  -f helpers/iaas/values-observer-ceph-[rook|ansible].yaml  # use values file for either the ceph-ansible or ceph-rook deployment
+```
@@ -0,0 +1,80 @@
+# K3s support
+
+K3s is a certified Kubernetes distribution optimized for production environments, particularly in remote locations
+or resource-constrained environments.
+
+This page contains information on how to develop and/or test the Kubernetes solution as a monitoring solution for a k3s
+cluster. It guides the user to create an HA k3s cluster via k3d (a wrapper to run k3s in Docker) and bootstrap
+it with the Kubernetes monitoring solution.
+
+Note that the following tutorial guides you to deploy an HA K3s cluster consisting of 3 control plane nodes (servers)
+and one worker node (agent). The reason is that the HA K3s cluster utilizes an embedded etcd cluster as cluster storage
+(refer to https://docs.k3s.io/datastore/ha-embedded).
+Using a single-node K3s cluster that uses the SQLite database (default) requires additional tweaks of monitoring values,
+which are not covered in this guide.
+
+## Prerequisites
+
+- [K3d](https://k3d.io/#installation)
+- [helm](https://helm.sh/)
+- [kubectl](https://kubernetes.io/docs/reference/kubectl/)
+
+## Prepare K3s Kubernetes cluster via K3d
+
+```bash
+k3d cluster create --config helpers/k3s/k3s-config.yaml --image rancher/k3s:v1.28.8-k3s1 observer
+```
+
+If you opt not to use K3D with the custom config we provided here, and prefer utilizing another k3s cluster,
+ensure that the metric endpoints for various control plane components are properly exposed.
+Refer to the [docs](https://dnationcloud.github.io/kubernetes-monitoring/helpers/FAQ/#kubernetes-monitoring-shows-or-0-state-for-some-control-plane-components-are-control-plane-components-working-correctly).
+
+## Deploy Observer monitoring solution
+
+K3s consolidates all Kubernetes control plane components into a single process, which means that the metrics for these
+control plane components are exposed on the K3d hosts rather than through individual Kubernetes Services/PODs.
+To customize monitoring values for K3s, refer to the specific custom HELM values file `helpers/k3s/values-observer-k3s.yaml`.
+This file contains the necessary configurations and adjustments needed to monitor K3s.
+Note that list of control plane node IPs (endpoints) should be overridden.
+
+Get and store the K3d control plane node IPs:
+```bash
+NODE_IPS=$(kubectl get nodes -l node-role.kubernetes.io/control-plane=true -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}' | tr ' ' ',' | sed 's/^/{&/;s/$/}/')
+```
+
+Install the monitoring stack and set the control plane component endpoints
+```bash
+helm repo add dnationcloud https://dnationcloud.github.io/helm-hub/
+helm repo update dnationcloud
+helm upgrade --install kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack -f helpers/k3s/values-observer-k3s.yaml \
+  --set "kube-prometheus-stack.kubeEtcd.endpoints=$NODE_IPS" \
+  --set "kube-prometheus-stack.kubeProxy.endpoints=$NODE_IPS" \
+  --set "kube-prometheus-stack.kubeControllerManager.endpoints=$NODE_IPS" \
+  --set "kube-prometheus-stack.kubeScheduler.endpoints=$NODE_IPS"
+```
+
+# Access the Observer monitoring UIs
+
+At this point, you should have the ability to access the Grafana, Alertmanager and Prometheus UIs
+within the Observer monitoring cluster.
+
+- Grafana UI
+  ```bash
+  http://localhost:30000
+  ```
+  - Use the following credentials:
+    - username: `admin`
+    - password: `pass`
+
+  - Visit the Layer 0 dashboard, `infrastructure-services-monitoring`, and drill down to explore cluster metrics
+    - http://localhost:30000/d/monitoring/infrastructure-services-monitoring
+
+- Alertmanager UI
+  ```bash
+  http://localhost:30001
+  ```
+
+- Prometheus UI
+  ```bash
+  http://localhost:30002
+  ```
@@ -0,0 +1,12 @@
+# Loki
+
+## loki-distributed
+
+This chart is deprecated and replaced by [loki](https://github.com/grafana/loki/tree/main/production/helm/loki) helm chart.
+Find the deprecated values in `helpers/loki/values-loki-distributed.yaml`
+
+Loki helm chart is the only helm chart you should use for loki helm deployment. It supports loki deployment in monolithic, scalable
+and even [distributed mode](https://grafana.com/docs/loki/next/setup/install/helm/install-microservices/).
+
+We recommend use the loki helm chart for all fresh installations. If you already use loki-distributed helm chart, check
+the migration [guide](https://grafana.com/docs/loki/latest/setup/migrate/migrate-from-distributed/).