-
Notifications
You must be signed in to change notification settings - Fork 78
Create Qwen3-Next-AMD.md #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @haic0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces new documentation that outlines the process for deploying and benchmarking the Qwen3-Next-80B-A3B-Instruct large language model on AMD MI300X GPUs. The guide provides clear, step-by-step instructions covering Docker setup, Hugging Face authentication, vLLM model serving, and performance evaluation, aiming to facilitate the use of this model on AMD hardware. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new guide for running Qwen3-Next on AMD GPUs. The guide is a good starting point, but I've identified several areas for improvement. My feedback includes a critical security concern regarding Docker volume mounts, as well as several suggestions to improve the document's structure, clarity, and formatting for consistency with other guides in the repository. These changes will make the guide more secure, readable, and easier for users to follow.
| ``` | ||
| Launch the Rocm-vllm docker: | ||
| ```shell | ||
| docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mounting the host's root directory (/) into the container with -v /:/work is a significant security risk. It grants the container unrestricted access to the entire host filesystem. It is strongly recommended to mount a specific working directory instead, for example, the current directory.
| docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly | |
| docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly |
| @@ -0,0 +1,30 @@ | |||
| #### Step by Step Guide | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -0,0 +1,30 @@ | |||
| #### Step by Step Guide | |||
| Please follow the steps here to install and run Qwen3-Next-80B-A3B-Instruct models on AMD MI300X GPU. | |||
| #### Step 1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ```shell | ||
| docker pull rocm/vllm-dev:nightly | ||
| ``` | ||
| Launch the Rocm-vllm docker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ```shell | ||
| docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly | ||
| ``` | ||
| #### Step 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Huggingface login | ||
| ```shell | ||
| huggingface-cli login | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #### Step 3 | ||
| ##### FP8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Run the vllm online serving | ||
| Sample Command | ||
| ```shell | ||
| VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra space in the command that should be removed for correctness.
| VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching | |
| VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching |
| ```shell | ||
| VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching | ||
| ``` | ||
| #### Step 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Open a new terminal, enter into the running docker and run the following benchmark script. | ||
| ```shell | ||
| docker exec -it Qwen3-next /bin/bash | ||
| python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructions for running the benchmark can be simplified. Instead of opening an interactive shell and then running the script, you can execute the script directly with docker exec. This makes it a single, non-interactive command and is less prone to user error. This also fixes extra whitespace in the command.
| Open a new terminal, enter into the running docker and run the following benchmark script. | |
| ```shell | |
| docker exec -it Qwen3-next /bin/bash | |
| python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el | |
| ``` | |
| Open a new terminal and run the following command to execute the benchmark script inside the container. | |
| ```shell | |
| docker exec -it Qwen3-next python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el |
|
Please add it to existing Qwen3-Next document |
Add Qwen3-Next support for AMD GPU.