|
| 1 | +# IaaS monitoring |
| 2 | + |
| 3 | +This component is marked as **experimental**. |
| 4 | + |
| 5 | +IaaS monitoring module currently integrates and is able to observe the following targets: |
| 6 | +- [OpenStack](#openstack) |
| 7 | +- [Ceph](#ceph) |
| 8 | + |
| 9 | +## Prerequisites |
| 10 | + |
| 11 | +To test the Monitoring of the IaaS layer we expect running Kubernetes cluster that already contains |
| 12 | +Kubernetes monitoring platform. |
| 13 | + |
| 14 | +### Local environment use case - KinD/K3s cluster deployed locally |
| 15 | + |
| 16 | +#### KinD |
| 17 | + |
| 18 | +Install the Kubernetes monitoring solution into the KinD Kubernetes cluster following the instructions provided in |
| 19 | +the [quickstart guide](quickstart.md). |
| 20 | + |
| 21 | +#### K3s |
| 22 | + |
| 23 | +Install the Kubernetes monitoring solution into the K3s Kubernetes cluster following the instructions provided in |
| 24 | +the [k3s guide](k3s.md). |
| 25 | + |
| 26 | +## Deploy IaaS monitoring components |
| 27 | + |
| 28 | +### OpenStack |
| 29 | + |
| 30 | +#### Prometheus metrics and alerts |
| 31 | + |
| 32 | +The [OpenStack exporter for Prometheus](https://github.com/openstack-exporter) could be deployed using the [openstack-exporter-helm-chart](https://github.com/SovereignCloudStack/openstack-exporter-helm-charts). |
| 33 | +This exporter contains a bunch of [Prometheus alerts and rules](https://github.com/SovereignCloudStack/openstack-exporter-helm-charts/blob/master/charts/prometheus-openstack-exporter/templates/prometheusrule.yaml) |
| 34 | +that are deployed together with the exporter. |
| 35 | +Visit the `helpers/iaas/openstack-exporter-values.yaml` file to validate the Helm configuration options. |
| 36 | +Ensure valid OpenStack API credentials are set under the `clouds_yaml_config` section. This **MUST** be overridden! |
| 37 | + |
| 38 | +```bash |
| 39 | +helm upgrade --install prometheus-openstack-exporter oci://registry.scs.community/openstack-exporter/prometheus-openstack-exporter \ |
| 40 | + --version 0.4.5 \ |
| 41 | + -f helpers/iaas/openstack-exporter-values.yaml # --set "endpoint_type=public" --set "serviceMonitor.scrapeTimeout=1m" |
| 42 | +``` |
| 43 | + |
| 44 | +Tip: If you want to test the exporter basic functionality with **public** OpenStack API, configure `endpoint_type` |
| 45 | +to `public` (`--set "endpoint_type=public"`). Note that configuring `endpoint_type` as `public` will result in |
| 46 | +incomplete functionality for the Grafana dashboard. |
| 47 | + |
| 48 | +Tip: Requesting and collecting metrics from the OpenStack API can be time-consuming, especially if the API is not |
| 49 | +performing well. In such cases, you may observe timeouts on the Prometheus server when it tries to fetch OpenStack |
| 50 | +metrics. To mitigate this, consider increasing the scrape interval to e.g. 1 minute (`--set "serviceMonitor.scrapeTimeout=1m"`). |
| 51 | + |
| 52 | +#### Grafana dashboards |
| 53 | + |
| 54 | +The Grafana dashboard designed to visualize metrics collected from an OpenStack cloud through the OpenStack exporter |
| 55 | +is publicly available at https://grafana.com/grafana/dashboards/21085. Its source code is located [here](https://github.com/SovereignCloudStack/k8s-observability/tree/main/iaas/dashboards). |
| 56 | +Feel free to import it to the Grafana via its source or ID. |
| 57 | +For automatic integration into the Kubernetes monitoring solution proceed to the next step. |
| 58 | + |
| 59 | +#### Update the Kubernetes monitoring deployment |
| 60 | + |
| 61 | +This step deploys the Grafana dashboards and instructs the monitoring stack to add the OpenStack exporter target into the Prometheus configuration: |
| 62 | + |
| 63 | +```bash |
| 64 | +helm upgrade kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values -f helpers/iaas/values-observer-iaas.yaml |
| 65 | +``` |
| 66 | + |
| 67 | +#### Access the OpenStack dashboard |
| 68 | + |
| 69 | +At this point, you should have the ability to access the Grafana UI, and OpenStack dashboard. |
| 70 | +Log in to the Grafana UI and find the OpenStack dashboard in IaaS directory. |
| 71 | + |
| 72 | +### Ceph |
| 73 | + |
| 74 | +This guide covers Ceph monitoring for Ceph clusters deployment by [ceph-ansible](https://github.com/ceph/ceph-ansible) and [rook operator](https://github.com/rook/rook). |
| 75 | +While both expose the same metrics via the same endpoints, there are some differences in Prometheus configuration and alerts. |
| 76 | + |
| 77 | +#### Prometheus metrics and alerts |
| 78 | + |
| 79 | +Ceph contains 2 build-in sources of metrics a.k.a. exporters. |
| 80 | +The Ceph exporter (introduced in Reef release of Ceph) is the main source of Ceph performance metrics. It runs as a |
| 81 | +dedicated daemon. This daemon runs on every Ceph cluster host and exposes a metrics end point where all the performance |
| 82 | +counters exposed by all the Ceph daemons running in the host are published in the form of Prometheus metrics. |
| 83 | + |
| 84 | +The second source of metrics is the Prometheus manager module. It exposes metrics related to the whole cluster, |
| 85 | +basically metrics that are not produced by individual Ceph daemons. |
| 86 | + |
| 87 | +Read the related Ceph [docs](https://docs.ceph.com/en/reef/monitoring/#ceph-metrics). |
| 88 | +Since these exporters are integrated with Ceph, deploying a third-party Ceph exporter is unnecessary. |
| 89 | + |
| 90 | +**Prometheus alerts** |
| 91 | + |
| 92 | +Both Ceph deployment strategies use the ceph-mixins project as a source of alerts. The ceph-ansible and rook projects |
| 93 | +each maintain a rendered version of these alerts, but the rook repository contains some differences, primarily because |
| 94 | +rook does not use the cephadm tool as a backend. |
| 95 | +Therefore, find and apply one of the following commands to create a custom observer rules values file for either the |
| 96 | +ceph-ansible or ceph-rook deployment ([yq](https://github.com/mikefarah/yq/#install) tool required): |
| 97 | + |
| 98 | +```bash |
| 99 | +# ceph-ansible |
| 100 | +curl -s https://raw.githubusercontent.com/ceph/ceph/main/monitoring/ceph-mixin/prometheus_alerts.yml | \ |
| 101 | + yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-ansible-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > helpers/iaas/values-observer-ceph-rules.yaml |
| 102 | + |
| 103 | +# rook |
| 104 | +curl -s https://raw.githubusercontent.com/rook/rook/master/deploy/charts/rook-ceph-cluster/prometheus/localrules.yaml | \ |
| 105 | + yq '{"kube-prometheus-stack": {"additionalPrometheusRulesMap": {"ceph-rook-rules": (. + {"additionalLabels": {"prometheus_rule": "1"}})}}}' > helpers/iaas/values-observer-ceph-rules.yaml |
| 106 | +``` |
| 107 | + |
| 108 | +#### Grafana dashboards |
| 109 | + |
| 110 | +We've tested and could recommend 2 sources of Grafana dashboards that are suitable for both Ceph deployment strategies (ansible and rook): |
| 111 | +- [dashboards linked in rook docs](https://rook.io/docs/rook/latest-release/Storage-Configuration/Monitoring/ceph-monitoring/?h=gra#grafana-dashboards) |
| 112 | +- [ceph-mixins dashboards](https://github.com/ceph/ceph-mixins/tree/master/dashboards) |
| 113 | + - Built version of ceph-mixins dashboards could be found e.g. [here](https://github.com/ceph/ceph/tree/main/monitoring/ceph-mixin/dashboards_out) |
| 114 | + |
| 115 | +We consider the dashboards created within the Rook project as a solid starting point for Ceph metrics visualization. |
| 116 | +If you want to see more detailed dashboards, uncomment and use the ceph-mixin dashboards in the `helpers/iaas/values-observer-ceph-rook.yaml` |
| 117 | +or `helpers/iaas/values-observer-ceph-ansible.yaml` file. You can use both. |
| 118 | + |
| 119 | +#### Update the Kubernetes monitoring deployment |
| 120 | + |
| 121 | +This step deploys Grafana dashboards, Prometheus rules and instruct monitoring stack to add the Ceph exporter targets into the Prometheus configuration. |
| 122 | +Ensure that you add the monitoring targets' IPs and ports to `helpers/iaas/values-observer-ceph-ansible.yaml` for Ceph-ansible deployment. |
| 123 | + |
| 124 | +```bash |
| 125 | +helm upgrade kubernetes-monitoring dnationcloud/dnation-kubernetes-monitoring-stack --reset-then-reuse-values \ |
| 126 | + -f helpers/iaas/values-observer-ceph-rules.yaml \ |
| 127 | + -f helpers/iaas/values-observer-ceph-[rook|ansible].yaml # use values file for either the ceph-ansible or ceph-rook deployment |
| 128 | +``` |
0 commit comments