Skip to content
This repository was archived by the owner on Oct 8, 2025. It is now read-only.

Commit fd4d25c

Browse files
author
Jason Schmidt
authored
feat: transition from standalone prometheus to kube-prometheus-stack (#70)
* feat: comment out standalone grafana in start/stop scripts * feat: update test-forwards utility script for prometheus operator use * feat: convert prometheus to kube-prometheus-stack * feat: Update utility script to use new services from prometheus operator * feat: add extras script to fix permissions on kube-proxy metrics * feat: modifications to NGINX IC to allow prometheus service monitor to pull metrics * feat: added service monitor for ledgerdb and accountdb postgres * feat: update README to reflect current configuration * feat: update documentation to clarify Grafana Password * fix: adjust depends_on for prometheus deployment * feat: remove grafana standalone in favor of prometheus kube stack * chore: upgrade pulumi version
1 parent b960c5b commit fd4d25c

File tree

15 files changed

+275
-194
lines changed

15 files changed

+275
-194
lines changed

pulumi/aws/README.md

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,8 @@ vpc - defines and installs the VPC and subnets to use with EKS
2828
└─logagent - deploys a logging agent (filebeat) to the EKS cluster
2929
└─certmgr - deploys the open source cert-manager.io helm chart to the EKS cluster
3030
└─prometheus - deploys prometheus server, node exporter, and statsd collector for metrics
31-
└─grafana - deploys the grafana visualization platform
32-
└─observability - deploys the OTEL operator and instantiates a simple collector
33-
└─sirius - deploys the Bank of Sirus application to the EKS cluster
31+
└─observability - deploys the OTEL operator and instantiates a simple collector
32+
└─sirius - deploys the Bank of Sirus application to the EKS cluster
3433
3534
```
3635

@@ -146,15 +145,40 @@ deployment.
146145
### Prometheus
147146
148147
Prometheus is deployed and configured to enable the collection of metrics for all components that have
149-
properties `prometheus.io:scrape: true` set in the annotations
150-
(along with any other connection information). This includes the prometheus `node-exporter`
151-
daemonset which is deployed in this step as well.
148+
a defined service monitor. At installation time, the deployment will instantiate:
149+
- Node Exporters
150+
- Kubernetes Service Monitors
151+
- Grafana preloaded with dashboards and datasources for Kubernetes management
152+
- The NGINX Ingress Controller
153+
- Statsd receiver
154+
155+
The former behavior of using the `prometheus.io:scrape: true` property set in the annotations
156+
indicating pods where metrics should be scraped has been deprecated, and these annotations will
157+
be removed in the near future.
158+
159+
Also, the standalone Grafana deployment has been removed from the standard deployment scripts, but has been left as
160+
a project in the event someone wishes to run this standalone.
161+
162+
Finally, this namespace will hold service monitors created by other projects, for example the Bank of Sirius
163+
deployment currently deploys a service monitor for each of the postgres monitors that are deployed.
164+
165+
Notes:
166+
1. The NGINX IC needs to be configured to expose prometheus metrics; this is currently done by default.
167+
2. The default address binding of the `kube-proxy` component is set to `127.0.0.1` and as such will cause errors when the
168+
canned prometheus scrape configurations are run. The fix is to set this address to `0.0.0.0`. An example manifest
169+
has been provided in [prometheus/extras](./prometheus/extras) that can be applied against your installation with
170+
`kubectl apply -f ./filename`. Please only apply this change once you have verified that it will work with your
171+
version of Kubernetes.
172+
3. The _grafana_ namespace has been maintained in the conifugration file to be used by the prometheus operator deployed
173+
version of Grafana. This version only accepts a password; you can still specify a username for the admin account but it
174+
will be silently ignored.
152175
153-
This also pulls data from the NGINX KIC, provided the KIC is configured to allow prometheus access (which is enabled by
154-
default).
155176
156177
### Grafana
157178
179+
**NOTE:** This deployment has been deprecated but the project has been left as an example on how to deploy Grafana in this
180+
architecture.
181+
158182
Grafana is deployed and configured with a connection to the prometheus datasource installed above. At the time of this
159183
writing, the NGINX Plus KIC dashboard is installed as part of the initial setup. Additional datasources and dashboards
160184
can be added by the user either in the code, or via the standard Grafana tooling.
@@ -188,7 +212,10 @@ As part of the Bank of Sirius deployment, we deploy a cluster-wide
188212
[self-signed](https://cert-manager.io/docs/configuration/selfsigned/)
189213
issuer using the cert-manager deployed above. This is then used by the Ingress object created to enable TLS access to
190214
the application. Note that this Issuer can be changed out by the user, for example to use the
191-
[ACME](https://cert-manager.io/docs/configuration/acme/) issuer.
215+
[ACME](https://cert-manager.io/docs/configuration/acme/) issuer. The use of the ACME issuer has been tested and works
216+
without issues, provided the FQDN meets the length requirements. As of this writing the AWS ELB hostname is too long
217+
to work with the ACME server. Additional work in this area will be undertaken to provide dynamic DNS record creation
218+
as part of this process so legitimate certificates can be issued.
192219
193220
In order to provide visibility into the Postgres databases that are running as part of the application, the Prometheus
194221
Postgres data exporter will be deployed into the same namespace as the application and will be configured to be scraped
@@ -204,4 +231,6 @@ provides better tools for hierarchical configuration files.
204231
205232
In order to help enable simple load testing, a script has been provided that uses the
206233
`kubectl` command to port-forward monitoring and management connections to the local workstation. This command
207-
is [`test-foward.sh`](./extras/test-forward.sh) and is located in the [`extras`](./extras) directory.
234+
is [`test-foward.sh`](./extras/test-forward.sh) and is located in the [`extras`](./extras) directory.
235+
236+
**NOTE:** This script has been modified to use the new Prometheus Operator based deployment.

pulumi/aws/config/Pulumi.stackname.yaml.example

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -178,16 +178,6 @@ config:
178178
############################################################################
179179

180180
# Grafana Configuration
181-
grafana:chart_name: grafana
182-
# Chart name for the helm chart for grafana
183-
grafana:chart_version: 6.13.7
184-
# Chart version for the helm chart for grafana
185-
grafana:helm_repo_name: grafana
186-
# Name of the repo to pull the grafana chart from
187-
grafana:helm_repo_url: https://grafana.github.io/helm-charts
188-
# URL of the chart repo to pull grafana from
189-
grafana:adminuser: admin
190-
# The username for the grafana installation
191181
grafana:adminpass: strongpass
192182
# The password for the grafana installation; note that this is not exposed to the internet
193183
# and requires kubeproxy to access. However, this should be encrypted which is dependent on
@@ -197,7 +187,7 @@ config:
197187
############################################################################
198188

199189
# Prometheus Configuration
200-
prometheus:chart_name: prometheus
190+
prometheus:chart_name: kube-prometheus-stack
201191
# Chart name for the helm chart for prometheus
202192
prometheus:chart_version: 14.6.0
203193
# Chart version for the helm chart for prometheus

pulumi/aws/destroy.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ if command -v aws > /dev/null; then
9191
validate_aws_credentials
9292
fi
9393

94-
k8s_projects=(sirius observability grafana prometheus certmgr logagent logstore kic-helm-chart)
94+
k8s_projects=(sirius observability prometheus certmgr logagent logstore kic-helm-chart)
9595

9696
# Test to see if EKS has been destroy AND there are still Kubernetes resources
9797
# that are being managed by Pulumi. If so, we have to destroy the stack for

pulumi/aws/extras/scripts/test-forward.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,15 +51,15 @@ kubectl port-forward service/elastic-kibana --namespace logstore 5601:5601 &
5151
echo $! > $PID01
5252

5353
## Grafana Tunnel
54-
kubectl port-forward service/grafana --namespace grafana 3000:80 &
54+
kubectl port-forward service/prometheus-grafana --namespace prometheus 3000:80 &
5555
echo $! > $PID02
5656

5757
## Loadgenerator Tunnel
5858
kubectl port-forward service/loadgenerator --namespace bos 8089:8089 &
5959
echo $! > $PID03
6060

6161
## Prometheus Tunnel
62-
kubectl port-forward service/prometheus-server --namespace prometheus 9090:80 &
62+
kubectl port-forward service/prometheus-kube-prometheus-prometheus --namespace prometheus 9090:9090 &
6363
echo $! > $PID04
6464

6565
## Elasticsearch Tunnel

pulumi/aws/grafana/Pulumi.yaml

Lines changed: 0 additions & 7 deletions
This file was deleted.

pulumi/aws/grafana/__main__.py

Lines changed: 0 additions & 139 deletions
This file was deleted.

pulumi/aws/kic-helm-chart/__main__.py

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,25 @@ def build_chart_values(repository: dict) -> helm.ChartOpts:
5151
'service': {
5252
'annotations': {
5353
'co.elastic.logs/module': 'nginx'
54-
}
54+
},
55+
"extraLabels": {
56+
"app": "kic-nginx-ingress"
57+
},
58+
"customPorts": [
59+
{
60+
"name": "dashboard",
61+
"targetPort": 8080,
62+
"protocol": "TCP",
63+
"port": 8080
64+
},
65+
{
66+
"name": "prometheus",
67+
"targetPort": 9113,
68+
"protocol": "TCP",
69+
"port": 9113
70+
}
71+
]
72+
5573
},
5674
'pod': {
5775
'annotations': {
@@ -62,7 +80,10 @@ def build_chart_values(repository: dict) -> helm.ChartOpts:
6280
'prometheus': {
6381
'create': True,
6482
'port': 9113
65-
}
83+
},
84+
"opentracing-tracer": "/usr/local/lib/libjaegertracing_plugin.so",
85+
"opentracing-tracer-config": "{\n \"service_name\": \"nginx-ingress\",\n \"propagation_format\": \"w3c\",\n \"sampler\": {\n \"type\": \"const\",\n \"param\": 1\n },\n \"reporter\": {\n \"localAgentHostPort\": \"simplest-collector.observability.svc.cluster.local:9978\"\n }\n} \n",
86+
"opentracing": True
6687
}
6788

6889
has_image_tag = 'image_tag' in repository or 'image_tag_alias' in repository
@@ -109,7 +130,10 @@ def build_chart_values(repository: dict) -> helm.ChartOpts:
109130
kubeconfig=kubeconfig)
110131

111132
ns = k8s.core.v1.Namespace(resource_name='nginx-ingress',
112-
metadata={'name': 'nginx-ingress'},
133+
metadata={'name': 'nginx-ingress',
134+
'labels': {
135+
'prometheus': 'scrape' }
136+
},
113137
opts=pulumi.ResourceOptions(provider=k8s_provider))
114138

115139
chart_values = ecr_repository.apply(build_chart_values)

0 commit comments

Comments
 (0)