-
Notifications
You must be signed in to change notification settings - Fork 96
Initial drop of the activator code #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
138e9a4 to
d929811
Compare
c8e531c to
227547b
Compare
1bf386c to
6bf3240
Compare
6bf3240 to
fd9c10d
Compare
|
rebase done @shmuelk |
fd9c10d to
6108eae
Compare
d57412b to
ed0f30b
Compare
| filter: | ||
| name: "envoy.filters.network.http_connection_manager" | ||
| patch: | ||
| operation: INSERT_FIRST # TODO: insert before EPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC: what will happen if this is deployed together with BBR - which ext proc will be invoked first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends on the order of which filters are applied. It really does not matter which one run first, maybe more optimal to have BBR before the activator as the BBR can reject invalid requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly. you are touching exactly the point I was trying to highlight.
It might be worth adding this information so some section in the deployment instructions.
we should recommend in which order to deploy them (and I agree as you mentioned that BBR should run first).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note in the readme.
|
|
||
| ## Install | ||
|
|
||
| To install an activator-filter named `activator-filter`, you can run the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoever is going to use activator isn't necessarily familiar with what is filter. we can call it just activator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is already a chart named activator. What about activator-istio?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a use case for deploying one chart without the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are needed. The only difference is one is applied once per namespace, the other one once per HTTPRoute.
479ea3f to
125d7b1
Compare
20714ba to
773ceec
Compare
|
/approve |
|
/hold |
da1abca to
2540e93
Compare
|
@elevran done. |
|
/unhold |
Dockerfile.activator
Outdated
| ENV GOARCH=$TARGETARCH | ||
|
|
||
| # Dependencies | ||
| WORKDIR /src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this line to:
WORKDIR /workspace
Dockerfile.activator
Outdated
| # Sources | ||
| COPY cmd/activator ./cmd/activator | ||
| COPY pkg/activator ./pkg/activator | ||
| WORKDIR /src/cmd/activator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line
Dockerfile.activator
Outdated
| COPY cmd/activator ./cmd/activator | ||
| COPY pkg/activator ./pkg/activator | ||
| WORKDIR /src/cmd/activator | ||
| RUN go build -ldflags="-X sigs.k8s.io/gateway-api-inference-extension/version.CommitSHA=${COMMIT_SHA} -X sigs.k8s.io/gateway-api-inference-extension/version.BuildRef=${BUILD_REF}" -o /activator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this Dockerfile more like the others in this repo:
Change: -o /activator to -o bin/activator cmd/activator/main.go
Dockerfile.activator
Outdated
| FROM registry.access.redhat.com/ubi9/ubi-minimal:latest | ||
|
|
||
| WORKDIR / | ||
| COPY --from=builder /activator /activator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change line to:
`COPY --from=builder /workspace/bin/activator /app/activator
Dockerfile.activator
Outdated
| WORKDIR / | ||
| COPY --from=builder /activator /activator | ||
|
|
||
| ENTRYPOINT ["/activator"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this line to:
ENTRYPOINT ["/app/activator"]
| // --- Setup Metrics Server --- | ||
| metricsServerOptions := metricsserver.Options{ | ||
| BindAddress: fmt.Sprintf(":%d", *metricsPort), | ||
| FilterProvider: filters.WithAuthenticationAndAuthorization, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my above comment related to security on the README.md file
4900a46
cbdad4b to
5fc9749
Compare
|
@shmuelk I resolved your recent comments |
Co-authored-by: Braulio Dumba <[email protected]> Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Braulio Dumba <[email protected]>
Signed-off-by: Braulio Dumba <[email protected]>
Co-authored-by: Pierangelo Di Pilato <[email protected]> Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Co-authored-by: Shmuel Kallner <[email protected]> Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
Signed-off-by: Lionel Villard <[email protected]>
5fc9749 to
47bbf69
Compare
This PR introduces a new component, called the activator, scaling from zero and to zero Kubernetes object attached to InferencePool.
Overview
The code consists of:
charts/activator-filter: a helm chart for inserting the activator into the gateway HTTP filter chain (Istio only). Needs to be applied only once.charts/activator: a helm chart configuring the activator for a given HTTPRoute and InferencePoolcmd/activator: the activator mainpkg/activator:Control-knobs
Here are the control-knobs attached to InferencePool objects as annotations.
activator.llm-d.ai/target-apiversion: APIVersion of the target object to scaleactivator.llm-d.ai/target-kind: Kind of the target object to scaleactivator.llm-d.ai/target-name: Name of the target object to scaleactivator.llm-d.ai/scale-from-zero-grace-period: time (in seconds) to wait for scale-from-zero decision to complete before aborting.activator.llm-d.ai/scale-down-delay: the minimum of time (in seconds) before the system decides to scale down the last replica to zeroHere are the control-knobs specified on the command line:
--enable-scale-to-zero: Enable scaling down InferencePool to zero replicas after a period of idleness (Default: false)Testing
Future Work
Co-authors
@dumb0002 @nilig @elevran @kfswain @shmuelk