Skip to content

Conversation

@junpuf
Copy link
Contributor

@junpuf junpuf commented Nov 7, 2025

a sample PR build and test workflow

Signed-off-by: Junpu Fan <[email protected]>
@aws-deep-learning-containers-ci aws-deep-learning-containers-ci bot added authorized Size:XL Determines the size of the PR labels Nov 7, 2025
junpuf added 27 commits November 7, 2025 15:06
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Signed-off-by: Junpu Fan <[email protected]>
Copy link
Member

@sirutBuasai sirutBuasai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll most likely need cuda compat script as well. not sure how vllm has been working thus far. https://github.com/aws/deep-learning-containers/blob/eb524f7c0737b007cf06d4fd36f67de246cc8d8f/sglang/build_artifacts/start_cuda_compat.sh

@junpuf
Copy link
Contributor Author

junpuf commented Nov 11, 2025

We'll most likely need cuda compat script as well. not sure how vllm has been working thus far. https://github.com/aws/deep-learning-containers/blob/eb524f7c0737b007cf06d4fd36f67de246cc8d8f/sglang/build_artifacts/start_cuda_compat.sh

Not running any SageMaker test yet

Comment on lines +94 to +114
- name: Download image URI artifact
uses: actions/download-artifact@v4
with:
name: vllm-rayserve-ec2-image-uri

- name: Resolve image URI for test
run: |
IMAGE_URI=$(cat image_uri.txt)
echo "Resolved image URI: $IMAGE_URI"
echo "IMAGE_URI=$IMAGE_URI" >> $GITHUB_ENV

- name: Pull image
run: |
docker pull $IMAGE_URI

- name: Checkout vLLM Tests
uses: actions/checkout@v5
with:
repository: vllm-project/vllm
ref: v0.10.2
path: vllm_source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a way to DRY these steps. These are going to be used repeatedly across multiple stages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later we can refactor common patterns into callable workflows or other things

parser = argparse.ArgumentParser()
parser.add_argument(
"--framework",
choices=["tensorflow", "mxnet", "pytorch", "base", "vllm"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably fix this list to ["tensorflow", "pytorch", "base", "vllm", "sglang"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this telemetry things doesn't work, cuz the template replacement stuff doesn't exist. will need to fix it separately

)
parser.add_argument(
"--container-type",
choices=["training", "inference", "general"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also side note and unrelated to this PR. Currently vllm and sglang are classified as general. We should change these to inference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the overall telemetry integration needs to be fixed.

@junpuf junpuf merged commit 57003ef into main Nov 12, 2025
9 checks passed
@junpuf junpuf deleted the try-build branch November 12, 2025 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authorized Size:XL Determines the size of the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants