Skip to content

Conversation

@lionelvillard
Copy link

@lionelvillard lionelvillard commented Oct 24, 2025

This PR introduces a new component, called the activator, scaling from zero and to zero Kubernetes object attached to InferencePool.

Overview

The code consists of:

  • charts/activator-filter: a helm chart for inserting the activator into the gateway HTTP filter chain (Istio only). Needs to be applied only once.
  • charts/activator: a helm chart configuring the activator for a given HTTPRoute and InferencePool
  • cmd/activator: the activator main
  • pkg/activator:
    • a ext-proc server handling header requests and ensuring there is at least one replica in the InferencePool.
    • a monitor scaling down to zero the object attached to the InferencePool after a configurable idleness period.

Control-knobs

Here are the control-knobs attached to InferencePool objects as annotations.

  • activator.llm-d.ai/target-apiversion: APIVersion of the target object to scale
  • activator.llm-d.ai/target-kind: Kind of the target object to scale
  • activator.llm-d.ai/target-name: Name of the target object to scale
  • activator.llm-d.ai/scale-from-zero-grace-period: time (in seconds) to wait for scale-from-zero decision to complete before aborting.
  • activator.llm-d.ai/scale-down-delay: the minimum of time (in seconds) before the system decides to scale down the last replica to zero

Here are the control-knobs specified on the command line:

  • --enable-scale-to-zero: Enable scaling down InferencePool to zero replicas after a period of idleness (Default: false)

Testing

  • e2e testing is done manually following these recipes.
    • recipe 1: multiple models using url-rewriting method
    • recipe 2: multiple models using BBR method
  • @dumb0002 recipes

Future Work

  • better documentation other than helm charts
  • load testing
  • more control-knobs

Co-authors

@dumb0002 @nilig @elevran @kfswain @shmuelk

@lionelvillard
Copy link
Author

@shmuelk @pierDipi All comments have been addressed.

I would like to rebase this PR after #416 is merged, as currently it's not possible to build the image when using make.

@lionelvillard
Copy link
Author

rebase done @shmuelk

@elevran elevran requested a review from shmuelk November 4, 2025 15:43
@lionelvillard lionelvillard force-pushed the activator branch 2 times, most recently from d57412b to ed0f30b Compare November 4, 2025 19:34
@elevran elevran self-requested a review November 5, 2025 14:30
filter:
name: "envoy.filters.network.http_connection_manager"
patch:
operation: INSERT_FIRST # TODO: insert before EPP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC: what will happen if this is deployed together with BBR - which ext proc will be invoked first?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on the order of which filters are applied. It really does not matter which one run first, maybe more optimal to have BBR before the activator as the BBR can reject invalid requests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly. you are touching exactly the point I was trying to highlight.
It might be worth adding this information so some section in the deployment instructions.
we should recommend in which order to deploy them (and I agree as you mentioned that BBR should run first).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note in the readme.


## Install

To install an activator-filter named `activator-filter`, you can run the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoever is going to use activator isn't necessarily familiar with what is filter. we can call it just activator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already a chart named activator. What about activator-istio?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a use case for deploying one chart without the other?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are needed. The only difference is one is applied once per namespace, the other one once per HTTPRoute.

@elevran
Copy link
Collaborator

elevran commented Nov 6, 2025

/approve
will let @shmuelk or @nirrozenbaum add the LGTM

@elevran elevran requested a review from nirrozenbaum November 6, 2025 18:34
elevran
elevran previously approved these changes Nov 6, 2025
@elevran
Copy link
Collaborator

elevran commented Nov 6, 2025

/hold
@lionelvillard please sign commits - missing verified signature

@github-actions github-actions bot added the hold label Nov 6, 2025
@lionelvillard lionelvillard force-pushed the activator branch 2 times, most recently from da1abca to 2540e93 Compare November 6, 2025 21:00
@lionelvillard
Copy link
Author

@elevran done.

@elevran
Copy link
Collaborator

elevran commented Nov 7, 2025

/unhold

ENV GOARCH=$TARGETARCH

# Dependencies
WORKDIR /src
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this line to:
WORKDIR /workspace

# Sources
COPY cmd/activator ./cmd/activator
COPY pkg/activator ./pkg/activator
WORKDIR /src/cmd/activator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line

COPY cmd/activator ./cmd/activator
COPY pkg/activator ./pkg/activator
WORKDIR /src/cmd/activator
RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/version.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/version.BuildRef=${BUILD_REF}" -o /activator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this Dockerfile more like the others in this repo:

Change: -o /activator to -o bin/activator cmd/activator/main.go

FROM registry.access.redhat.com/ubi9/ubi-minimal:latest

WORKDIR /
COPY --from=builder /activator /activator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change line to:

`COPY --from=builder /workspace/bin/activator /app/activator

WORKDIR /
COPY --from=builder /activator /activator

ENTRYPOINT ["/activator"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this line to:

ENTRYPOINT ["/app/activator"]

// --- Setup Metrics Server ---
metricsServerOptions := metricsserver.Options{
BindAddress: fmt.Sprintf(":%d", *metricsPort),
FilterProvider: filters.WithAuthenticationAndAuthorization,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my above comment related to security on the README.md file

@lionelvillard lionelvillard force-pushed the activator branch 2 times, most recently from cbdad4b to 5fc9749 Compare November 10, 2025 15:03
@lionelvillard
Copy link
Author

@shmuelk I resolved your recent comments

lionelvillard and others added 13 commits November 13, 2025 08:57
Co-authored-by: Braulio Dumba <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Braulio Dumba <[email protected]>
Co-authored-by: Pierangelo Di Pilato <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Co-authored-by: Shmuel Kallner <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

6 participants