Migrate vLLM Ray Serve Container #5463

junpuf · 2025-11-07T22:55:13Z

a sample PR build and test workflow

Signed-off-by: Junpu Fan <[email protected]>

sirutBuasai

We'll most likely need cuda compat script as well. not sure how vllm has been working thus far. https://github.com/aws/deep-learning-containers/blob/eb524f7c0737b007cf06d4fd36f67de246cc8d8f/sglang/build_artifacts/start_cuda_compat.sh

junpuf · 2025-11-11T23:55:25Z

We'll most likely need cuda compat script as well. not sure how vllm has been working thus far. https://github.com/aws/deep-learning-containers/blob/eb524f7c0737b007cf06d4fd36f67de246cc8d8f/sglang/build_artifacts/start_cuda_compat.sh

Not running any SageMaker test yet

sirutBuasai · 2025-11-11T23:56:23Z

.github/workflows/pr-vllm-rayserve.yml

+      - name: Download image URI artifact
+        uses: actions/download-artifact@v4
+        with:
+          name: vllm-rayserve-ec2-image-uri
+
+      - name: Resolve image URI for test
+        run: |
+          IMAGE_URI=$(cat image_uri.txt)
+          echo "Resolved image URI: $IMAGE_URI"
+          echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV
+
+      - name: Pull image
+        run: |
+          docker pull $IMAGE_URI
+
+      - name: Checkout vLLM Tests
+        uses: actions/checkout@v5
+        with:
+          repository: vllm-project/vllm
+          ref: v0.10.2
+          path: vllm_source


I wonder if there's a way to DRY these steps. These are going to be used repeatedly across multiple stages

later we can refactor common patterns into callable workflows or other things

sirutBuasai · 2025-11-11T23:58:05Z

scripts/telemetry/deep_learning_container.py

+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--framework",
+        choices=["tensorflow", "mxnet", "pytorch", "base", "vllm"],


We can probably fix this list to ["tensorflow", "pytorch", "base", "vllm", "sglang"]

actually this telemetry things doesn't work, cuz the template replacement stuff doesn't exist. will need to fix it separately

sirutBuasai · 2025-11-11T23:59:03Z

scripts/telemetry/deep_learning_container.py

+    )
+    parser.add_argument(
+        "--container-type",
+        choices=["training", "inference", "general"],


Also side note and unrelated to this PR. Currently vllm and sglang are classified as general. We should change these to inference.

the overall telemetry integration needs to be fixed.

try build

8ea932b

Signed-off-by: Junpu Fan <[email protected]>

aws-deep-learning-containers-ci bot added authorized Size:XL Determines the size of the PR labels Nov 7, 2025

junpuf added 27 commits November 7, 2025 15:06

fix action

50e9793

Signed-off-by: Junpu Fan <[email protected]>

using long commit ref

47e7bf6

Signed-off-by: Junpu Fan <[email protected]>

install/update uv only if not already installed

f3e7416

Signed-off-by: Junpu Fan <[email protected]>

update

96d976b

Signed-off-by: Junpu Fan <[email protected]>

update

e21334c

Signed-off-by: Junpu Fan <[email protected]>

fix actionlint

082f67c

Signed-off-by: Junpu Fan <[email protected]>

try inline cache

a82924d

Signed-off-by: Junpu Fan <[email protected]>

fix

d82b4a1

Signed-off-by: Junpu Fan <[email protected]>

use buildx

c7d65bc

Signed-off-by: Junpu Fan <[email protected]>

per day cache refresh

09bfc63

Signed-off-by: Junpu Fan <[email protected]>

update

8a21087

Signed-off-by: Junpu Fan <[email protected]>

fix

031a0e8

Signed-off-by: Junpu Fan <[email protected]>

test

df2d590

Signed-off-by: Junpu Fan <[email protected]>

fix

2d59406

Signed-off-by: Junpu Fan <[email protected]>

try artifact

75a8f1a

Signed-off-by: Junpu Fan <[email protected]>

update docker command

65975f7

Signed-off-by: Junpu Fan <[email protected]>

fix command

ff4725e

Signed-off-by: Junpu Fan <[email protected]>

fix command

3dd1a99

Signed-off-by: Junpu Fan <[email protected]>

fix entrypoint

872029d

Signed-off-by: Junpu Fan <[email protected]>

update test

fadf714

Signed-off-by: Junpu Fan <[email protected]>

fix command

557e649

Signed-off-by: Junpu Fan <[email protected]>

checkout vllm

58aa567

Signed-off-by: Junpu Fan <[email protected]>

update workflow

b071a75

Signed-off-by: Junpu Fan <[email protected]>

update

e362483

Signed-off-by: Junpu Fan <[email protected]>

fix

369551b

Signed-off-by: Junpu Fan <[email protected]>

try test

aeebfe8

Signed-off-by: Junpu Fan <[email protected]>

fix typo

18f2b64

Signed-off-by: Junpu Fan <[email protected]>

junpuf added 22 commits November 10, 2025 18:19

update

85cffdf

Signed-off-by: Junpu Fan <[email protected]>

fix

12e2dc1

Signed-off-by: Junpu Fan <[email protected]>

update script

ccb5a73

Signed-off-by: Junpu Fan <[email protected]>

enable Entrypoints Integration Test (LLM)

43c2232

Signed-off-by: Junpu Fan <[email protected]>

update

15f0c89

Signed-off-by: Junpu Fan <[email protected]>

update

e9fa11c

Signed-off-by: Junpu Fan <[email protected]>

update test

60fd04f

Signed-off-by: Junpu Fan <[email protected]>

add cleanup

56d85c1

Signed-off-by: Junpu Fan <[email protected]>

fix

432917d

Signed-off-by: Junpu Fan <[email protected]>

update

e5ad9e6

Signed-off-by: Junpu Fan <[email protected]>

update

8e7a408

Signed-off-by: Junpu Fan <[email protected]>

update

f5e61e3

Signed-off-by: Junpu Fan <[email protected]>

update

16d5f1e

Signed-off-by: Junpu Fan <[email protected]>

update workflow

c0a8c85

Signed-off-by: Junpu Fan <[email protected]>

enable more test

af227b4

Signed-off-by: Junpu Fan <[email protected]>

update tests

6ba7e45

Signed-off-by: Junpu Fan <[email protected]>

parallel tests

c3cc99c

Signed-off-by: Junpu Fan <[email protected]>

remove encoder decoder test

dcb9302

Signed-off-by: Junpu Fan <[email protected]>

add hf token

c7b284b

Signed-off-by: Junpu Fan <[email protected]>

update

9807927

Signed-off-by: Junpu Fan <[email protected]>

remove push on main

11ead3b

Signed-off-by: Junpu Fan <[email protected]>

revert

92f77d9

Signed-off-by: Junpu Fan <[email protected]>

sirutBuasai reviewed Nov 11, 2025

View reviewed changes

sirutBuasai approved these changes Nov 11, 2025

View reviewed changes

junpuf merged commit 57003ef into main Nov 12, 2025
9 checks passed

junpuf deleted the try-build branch November 12, 2025 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate vLLM Ray Serve Container #5463

Migrate vLLM Ray Serve Container #5463

Uh oh!

junpuf commented Nov 7, 2025 •

edited

Loading

Uh oh!

sirutBuasai left a comment

Uh oh!

junpuf commented Nov 11, 2025

Uh oh!

sirutBuasai Nov 11, 2025

Uh oh!

junpuf Nov 11, 2025

Uh oh!

sirutBuasai Nov 11, 2025

Uh oh!

junpuf Nov 11, 2025

Uh oh!

sirutBuasai Nov 11, 2025

Uh oh!

junpuf Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Migrate vLLM Ray Serve Container #5463

Migrate vLLM Ray Serve Container #5463

Uh oh!

Conversation

junpuf commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sirutBuasai left a comment

Choose a reason for hiding this comment

Uh oh!

junpuf commented Nov 11, 2025

Uh oh!

sirutBuasai Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

junpuf Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

sirutBuasai Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

junpuf Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

sirutBuasai Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

junpuf Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junpuf commented Nov 7, 2025 •

edited

Loading