-
Notifications
You must be signed in to change notification settings - Fork 2
LLM Router & Dynamo Integration #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced `dynamo-llm-config.yaml` for full production deployment with multiple models. - Added `dynamo-single-llm-config.yaml` for minimal testing with a single model. - Created `llm-router-values-override.yaml` for Helm deployment customization. - Added routing configuration files: `router-config.yaml` for full deployment and `router-config-single.yaml` for single model verification. - Updated `README.md` with comprehensive deployment instructions and configuration options.
- Added license information to `dynamo-llm-config.yaml`, `dynamo-single-llm-config.yaml`, `llm-router-values-override.yaml`, `router-config-single.yaml`, and `router-config.yaml`. - Ensured compliance with Apache License 2.0 for all relevant configuration files.
- Added `deploy-dynamo-integration.sh` script for automated deployment of NVIDIA Dynamo Cloud Platform and LLM Router. - Introduced `dynamo-cloud-deployment.yaml` for configuring the Dynamo deployment. - Created `dynamo-llm-deployment.yaml` for deploying LLM inference graphs. - Added `router-config-dynamo.yaml` for routing configuration specific to Dynamo integration. - Removed outdated configuration files: `dynamo-single-llm-config.yaml` and `router-config-single.yaml`. - Updated `README.md` to reflect new deployment instructions and configuration details.
- Updated `llm-router-values-override.yaml` to enable ingress with NGINX configuration. - Added detailed ingress setup instructions to `README.md`, including host configuration and testing methods. - Provided examples for testing routing via ingress and port-forwarding.
…ntegration - Deleted `deploy-dynamo-integration.sh` and `dynamo-cloud-deployment.yaml` as they are no longer needed. - Updated `README.md` to reflect the removal of these files and adjusted deployment instructions accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds NVIDIA Dynamo Cloud Platform integration by introducing new router configurations, Helm value overrides, and a sample DynamoGraphDeployment.
- Defines LLM routing policies for task and complexity routers pointing to Dynamo endpoints.
- Provides Helm values to configure router controller/server, ingress, RBAC, and resource settings.
- Includes an example DynamoGraphDeployment and corresponding Service manifest for LLM inference.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
customizations/LLM Router/router-config-dynamo.yaml | New YAML defining task_router and complexity_router policies for Dynamo endpoints. |
customizations/LLM Router/llm-router-values-override.yaml | Helm values override configuring images, ingress, security, and RBAC for Dynamo integration. |
customizations/LLM Router/dynamo-llm-deployment.yaml | Sample DynamoGraphDeployment and Service manifest for deploying LLMs on Dynamo. |
Comments suppressed due to low confidence (3)
customizations/LLM Router/llm-router-values-override.yaml:25
- [nitpick] Pin specific image tags instead of using
latest
to ensure reproducible and auditable deployments.
tag: latest
customizations/LLM Router/llm-router-values-override.yaml:56
- [nitpick] Pin specific image tags instead of using
latest
to ensure reproducible and auditable deployments.
tag: latest
customizations/LLM Router/dynamo-llm-deployment.yaml:147
- The selector value has a trailing space (
Frontend
), which may prevent it from matching the Pod label; remove the extra space.
dynamo-component: Frontend
className: "nginx" # Use your cluster's ingress class | ||
annotations: | ||
nginx.ingress.kubernetes.io/rewrite-target: / | ||
nginx.ingress.kubernetes.io/ssl-redirect: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabling SSL redirect exposes the service over plain HTTP; consider enabling TLS and redirecting HTTPS to protect data in transit.
nginx.ingress.kubernetes.io/ssl-redirect: "false" | |
nginx.ingress.kubernetes.io/ssl-redirect: "true" |
Copilot uses AI. Check for mistakes.
llms: | ||
- name: Brainstorming | ||
api_base: http://dynamo-llm-service.dynamo-cloud.svc.cluster.local:8080/v1 | ||
api_key: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing an empty api_key
means there’s no authentication; use Kubernetes Secrets or environment variables to securely inject real credentials.
Copilot uses AI. Check for mistakes.
- Updated `README.md` to reflect the new directory name and provide an overview of NVIDIA Dynamo Customizations. - Added detailed sections on available customizations, including LLM Router integration and its benefits. - Modified `dynamo-llm-deployment.yaml` to clarify component reference instructions. - Enhanced `llm-router-values-override.yaml` with additional configuration options for API base and namespace. - Updated `router-config-dynamo.yaml` to utilize environment variables for API key management and service endpoints. - Expanded `README.md` in the LLM Router directory with comprehensive deployment instructions and configuration validation steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments about changes that need to be made given that Dynamo Deploy now has a vastly simplified UX. PIng me for any questions
customizations/LLM Router/README.md
Outdated
|
||
- **Kubernetes cluster** (1.24+) with kubectl configured | ||
- **Helm 3.x** for managing deployments | ||
- **Earthly** for building Dynamo components ([Install Guide](https://earthly.dev/get-earthly)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need earthly anymore with pre-published images
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again Quickstart mentions about earthly here
customizations/LLM Router/README.md
Outdated
|
||
You'll need to configure these environment variables before deployment: | ||
|
||
| Variable | Description | Example | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of these env vars are not needed to deploy Dyn Cloud platform anymore
customizations/LLM Router/README.md
Outdated
docker compose -f deploy/metrics/docker-compose.yml up -d | ||
``` | ||
|
||
### Step 2: Deploy Dynamo Cloud Platform (For Kubernetes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 258-309 need refactoring based on the new simplified workflow captured in our new quickstart here
dynamo build and deploy commands are now gone and the new workflow should be a lot more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quickstart mentions to build images using earthly here
# Based on: https://docs.nvidia.com/dynamo/latest/guides/dynamo_deploy/dynamo_operator.html | ||
|
||
--- | ||
apiVersion: nvidia.com/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CR needs to be updated. dynamoComponent
and related fields are no longer supported as part of the new workflow. Each component needs to bring its own image. Please use the following CRs as reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CR link seems to broken. Can you give me the correct link ?
- Added `disagg.yaml` for deploying the vLLM backend with disaggregated serving capabilities. - Removed the deprecated `dynamo-llm-deployment.yaml` file to streamline the configuration. - Updated `README.md` to include detailed instructions for the new disaggregated deployment setup. - Revised `router-config-dynamo.yaml` to reflect the models currently deployed and their configurations. - Enhanced documentation to clarify the integration of multiple models and their routing strategies.
- Revised sections for NVIDIA Dynamo and LLM Router to improve clarity and detail. - Updated descriptions to reflect disaggregated serving capabilities and multi-model deployment. - Enhanced task classification and complexity analysis descriptions for better understanding. - Improved overall documentation for optimal performance insights.
- Streamlined descriptions for NVIDIA Dynamo and LLM Router to enhance clarity. - Consolidated features into concise bullet points for better readability. - Updated task classification and complexity analysis sections for improved understanding. - Removed unnecessary table formatting to simplify the layout.
…Router documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
- Introduced `agg.yaml` for deploying vLLM with a single GPU setup. - Updated `disagg.yaml` to use environment variables for model names and unified image references. - Enhanced `README.md` to include new environment variables and deployment instructions for both aggregated and disaggregated configurations.
- Removed frontend service definitions from `agg.yaml` and `disagg.yaml`, consolidating them into a new `frontend.yaml` for a shared API service. - Updated `llm-router-values-override.yaml` to adjust API base and repository configurations for improved deployment flexibility. - Enhanced `README.md` to reflect the new shared frontend architecture, detailing its benefits and deployment instructions. - Revised routing strategies and model configurations to streamline multi-model deployments and reduce resource overhead.
…n changes - Revised README.md to reflect updated worker templates and multi-model support, enhancing clarity on deployment configurations. - Changed model reference in router-config-dynamo.yaml from `meta-llama/Llama-3.1-70B-Instruct` to `mistralai/Mixtral-8x22B-Instruct-v0.1`, aligning with the new deployment strategy. - Improved health check configuration details in README.md to ensure proper service-level setup.
…M Router deployment - Revised environment variable section to include additional variables such as `DYNAMO_IMAGE`, `DYNAMO_API_BASE`, and `DYNAMO_API_KEY`, clarifying their usage in deployment. - Updated model recommendations to reflect current configurations and performance insights. - Improved instructions for setting up environment variables, ensuring clarity for users during deployment.
…loyment instructions - Added image pull secret configuration in llm-router-values-override.yaml to support private registry access. - Revised README.md to enhance clarity on environment variable usage and deployment steps, including updates to model deployment instructions and Kubernetes secret references. - Streamlined instructions for creating ConfigMaps and verifying deployments to ensure a smoother user experience.
No description provided.