-
Notifications
You must be signed in to change notification settings - Fork 521
Migrate vLLM Ray Serve Container #5463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
sirutBuasai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll most likely need cuda compat script as well. not sure how vllm has been working thus far. https://github.com/aws/deep-learning-containers/blob/eb524f7c0737b007cf06d4fd36f67de246cc8d8f/sglang/build_artifacts/start_cuda_compat.sh
Not running any SageMaker test yet |
| - name: Download image URI artifact | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: vllm-rayserve-ec2-image-uri | ||
|
|
||
| - name: Resolve image URI for test | ||
| run: | | ||
| IMAGE_URI=$(cat image_uri.txt) | ||
| echo "Resolved image URI: $IMAGE_URI" | ||
| echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV | ||
|
|
||
| - name: Pull image | ||
| run: | | ||
| docker pull $IMAGE_URI | ||
|
|
||
| - name: Checkout vLLM Tests | ||
| uses: actions/checkout@v5 | ||
| with: | ||
| repository: vllm-project/vllm | ||
| ref: v0.10.2 | ||
| path: vllm_source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a way to DRY these steps. These are going to be used repeatedly across multiple stages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later we can refactor common patterns into callable workflows or other things
| parser = argparse.ArgumentParser() | ||
| parser.add_argument( | ||
| "--framework", | ||
| choices=["tensorflow", "mxnet", "pytorch", "base", "vllm"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably fix this list to ["tensorflow", "pytorch", "base", "vllm", "sglang"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this telemetry things doesn't work, cuz the template replacement stuff doesn't exist. will need to fix it separately
| ) | ||
| parser.add_argument( | ||
| "--container-type", | ||
| choices=["training", "inference", "general"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also side note and unrelated to this PR. Currently vllm and sglang are classified as general. We should change these to inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the overall telemetry integration needs to be fixed.
a sample PR build and test workflow