Skip to content

Commit 56fb1ac

Browse files
committed
fix: fix keda with activator issue.
Signed-off-by: X1aoZEOuO <[email protected]>
1 parent e1a95d3 commit 56fb1ac

File tree

1 file changed

+61
-6
lines changed

1 file changed

+61
-6
lines changed

site/content/en/docs/features/serverless.md

Lines changed: 61 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,33 @@ weight: 4
77

88
This comprehensive guide provides enterprise-grade configuration patterns for serverless environments on Kubernetes, focusing on advanced integrations between Prometheus monitoring and KEDA autoscaling. The architecture delivers optimal resource efficiency through event-driven scaling while maintaining observability and resilience for AI/ML workloads and other latency-sensitive applications.
99

10+
### Relationship Between Activator and KEDA
11+
12+
The serverless architecture combines two complementary components:
13+
14+
- **KEDA (Kubernetes Event-driven Autoscaling)**: Handles dynamic scaling based on metrics from Prometheus or other event sources. KEDA monitors application metrics (such as request queue length, processing time, or custom metrics) and automatically adjusts the number of replicas between `minReplicaCount` and `maxReplicaCount` to meet demand.
15+
16+
- **Activator**: Serves as a request interceptor and buffer for scale-from-zero scenarios. When KEDA scales a workload down to zero replicas (to save resources during idle periods), the activator intercepts incoming requests, triggers KEDA to scale up the workload, buffers the requests during the cold start period, and forwards them once the workload is ready.
17+
18+
Together, these components enable true serverless behavior: workloads can scale to zero when idle (minimizing costs) and automatically scale up on-demand when requests arrive (maintaining responsiveness). KEDA provides the scaling mechanism, while the activator ensures no requests are lost during the scale-from-zero process.
19+
1020
## Concepts
1121

1222
### Prometheus Configuration
1323

14-
Prometheus is utilized for monitoring and alerting purposes. To enable cross-namespace ServiceMonitor discovery, configure the `namespaceSelector`. In Prometheus, define the `serviceMonitorSelector` to associate with ServiceMonitors.
24+
Prometheus is utilized for monitoring and alerting purposes in the serverless architecture. It collects and stores metrics from your workloads, which KEDA can then query to make scaling decisions.
25+
26+
#### ServiceMonitor Explained
27+
28+
A ServiceMonitor is a Kubernetes custom resource that tells Prometheus which services to monitor and how to scrape metrics from them. The key configuration aspects are:
29+
30+
- **Cross-namespace Discovery**: The `namespaceSelector` field allows Prometheus to discover and monitor services across different namespaces. Setting `any: true` enables monitoring of services in any namespace, not just the namespace where the ServiceMonitor is deployed.
31+
32+
- **Label Matching**: The `selector.matchLabels` field defines which services to monitor based on their labels. In this example, it monitors all services with the label `llmaz.io/model-name: qwen2-0--5b`.
33+
34+
- **Metrics Endpoint**: The `endpoints` section specifies how to scrape metrics - which port to use, the HTTP path where metrics are exposed, and the protocol (HTTP/HTTPS).
35+
36+
Here's an example ServiceMonitor configuration:
1537

1638
```yaml
1739
apiVersion: monitoring.coreos.com/v1
@@ -27,19 +49,40 @@ spec:
2749
any: true
2850
selector:
2951
matchLabels:
52+
# Service label, you can change this to match your service label
3053
llmaz.io/model-name: qwen2-0--5b
3154
endpoints:
3255
- port: http
3356
path: /metrics
3457
scheme: http
3558
```
3659
37-
- Ensure the `namespaceSelector` is configured to allow cross-namespace monitoring.
38-
- Appropriately label your services to be discovered by Prometheus.
60+
**Important Configuration Notes:**
61+
62+
- Ensure the `namespaceSelector` is configured to allow cross-namespace monitoring if your services are in different namespaces than your ServiceMonitor.
63+
- Your Kubernetes Services must have matching labels (as specified in `selector.matchLabels`) for Prometheus to discover them.
64+
- The service must expose a `/metrics` endpoint (or the path you specify) that returns metrics in Prometheus format.
65+
- Verify that your Prometheus instance is configured with a `serviceMonitorSelector` that matches the labels on this ServiceMonitor (in this example, `control-plane: controller-manager` and `app.kubernetes.io/name: servicemonitor`).
3966

4067
### KEDA Configuration
4168

42-
KEDA (Kubernetes Event-driven Autoscaling) is employed for scaling applications based on custom metrics. It can be integrated with Prometheus to trigger scaling actions.
69+
KEDA (Kubernetes Event-driven Autoscaling) is employed for scaling applications based on custom metrics. It extends Kubernetes' native autoscaling capabilities by allowing you to scale based on external event sources like Prometheus metrics, message queues, or cloud events.
70+
71+
#### ScaledObject Explained
72+
73+
A ScaledObject is the core KEDA resource that defines how to scale your workload. The key configuration aspects are:
74+
75+
- **Scale Target**: The `scaleTargetRef` field specifies which Kubernetes resource to scale. In this example, it targets a custom resource of type `Playground` from the `inference.llmaz.io` API group. KEDA can scale Deployments, StatefulSets, or any custom resource that implements the scale subresource.
76+
77+
- **Replica Boundaries**: The `minReplicaCount` and `maxReplicaCount` fields define the scaling limits. Setting `minReplicaCount: 0` enables scale-to-zero functionality, which requires the activator component to handle cold starts.
78+
79+
- **Timing Parameters**:
80+
- `pollingInterval` (30s in this example): How frequently KEDA checks the metrics from the trigger source
81+
- `cooldownPeriod` (50s in this example): The waiting period after the last trigger activation before scaling back down to prevent flapping
82+
83+
- **Prometheus Trigger**: The `triggers` section defines what metrics to monitor and when to scale. The Prometheus trigger queries a specific metric and compares it against a threshold. When the metric value exceeds the threshold, KEDA scales up; when it falls below, KEDA scales down (after the cooldown period).
84+
85+
Here's an example ScaledObject configuration:
4386

4487
```yaml
4588
apiVersion: keda.sh/v1alpha1
@@ -65,8 +108,20 @@ spec:
65108
threshold: "0.2"
66109
```
67110

68-
- Ensure the `serverAddress` correctly points to the Prometheus service.
69-
- Adjust `pollingInterval` and `cooldownPeriod` to optimize scaling behavior and prevent conflicts with other scaling mechanisms.
111+
**Important Configuration Notes:**
112+
113+
- **Prometheus Connection**: Ensure the `serverAddress` correctly points to your Prometheus service. The format is typically `http://<service-name>.<namespace>.svc.cluster.local:<port>`.
114+
115+
- **Metric Query**: The `query` field should be a valid PromQL (Prometheus Query Language) expression. In this example, `sum(llamacpp:requests_processing)` aggregates all processing requests across instances.
116+
117+
- **Threshold Tuning**: The `threshold` value determines when scaling occurs. For example, `"0.2"` means KEDA will scale up when the metric value exceeds 0.2. Adjust this based on your workload characteristics and desired responsiveness.
118+
119+
- **Timing Optimization**:
120+
- Set `pollingInterval` based on how quickly you need to respond to load changes (shorter intervals = faster response but more API calls)
121+
- Set `cooldownPeriod` long enough to avoid scaling down too quickly during temporary traffic drops, but short enough to save resources during idle periods
122+
- Ensure these values don't conflict with HPA (Horizontal Pod Autoscaler) if you're using both
123+
124+
- **Scale-to-Zero Requirements**: When using `minReplicaCount: 0`, you must deploy the activator component to handle requests during cold starts and trigger the scale-up process.
70125

71126
### Integration with Activator
72127

0 commit comments

Comments
 (0)