Skip to content

Commit eb1e929

Browse files
committed
Merge branch 'develop' of https://github.com/Center-for-AI-Innovation/llm-inference into hf_download
2 parents 4de3563 + 6ec69cf commit eb1e929

27 files changed

+4547
-2624
lines changed

.github/workflows/code_checks.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030
steps:
3131
- uses: actions/[email protected]
3232
- name: Install uv
33-
uses: astral-sh/setup-uv@v6
33+
uses: astral-sh/setup-uv@v7
3434
with:
3535
# Install a specific version of uv.
3636
version: "0.5.21"
@@ -40,7 +40,7 @@ jobs:
4040
with:
4141
python-version-file: ".python-version"
4242
- name: Install the project
43-
run: uv sync --dev
43+
run: uv sync --dev --prerelease=allow
4444
- name: Install dependencies and check code
4545
run: |
4646
source .venv/bin/activate

.github/workflows/docker.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ on:
2121
jobs:
2222
push_to_registry:
2323
name: Push Docker image to Docker Hub
24-
runs-on: ubuntu-latest
24+
runs-on:
25+
- self-hosted
26+
- docker
2527
steps:
2628
- name: Checkout repository
2729
uses: actions/[email protected]
@@ -32,6 +34,9 @@ jobs:
3234
VERSION=$(grep -A 1 'name = "vllm"' uv.lock | grep version | cut -d '"' -f 2)
3335
echo "version=$VERSION" >> $GITHUB_OUTPUT
3436
37+
- name: Set up Docker Buildx
38+
uses: docker/setup-buildx-action@v3
39+
3540
- name: Log in to Docker Hub
3641
uses: docker/login-action@5e57cd118135c172c3672efd75eb46360885c0ef
3742
with:
@@ -40,7 +45,7 @@ jobs:
4045

4146
- name: Extract metadata (tags, labels) for Docker
4247
id: meta
43-
uses: docker/metadata-action@c1e51972afc2121e065aed6d45c65596fe445f3f
48+
uses: docker/metadata-action@318604b99e75e41977312d83839a89be02ca4893
4449
with:
4550
images: vectorinstitute/vector-inference
4651

.github/workflows/docs.yml

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ jobs:
5656
fetch-depth: 0 # Fetch all history for proper versioning
5757

5858
- name: Install uv
59-
uses: astral-sh/setup-uv@v6
59+
uses: astral-sh/setup-uv@v7
6060
with:
6161
version: "0.5.21"
6262
enable-cache: true
@@ -67,16 +67,16 @@ jobs:
6767
python-version-file: ".python-version"
6868

6969
- name: Install the project
70-
run: uv sync --all-extras --group docs
70+
run: uv sync --all-extras --group docs --prerelease=allow
7171

7272
- name: Build docs
73-
run: uv run mkdocs build
73+
run: uv run --frozen mkdocs build
7474

7575
- name: Create .nojekyll file
7676
run: touch site/.nojekyll
7777

7878
- name: Upload artifact
79-
uses: actions/upload-artifact@v4
79+
uses: actions/upload-artifact@v5
8080
with:
8181
name: docs-site
8282
path: site/
@@ -93,7 +93,7 @@ jobs:
9393
fetch-depth: 0 # Fetch all history for proper versioning
9494

9595
- name: Install uv
96-
uses: astral-sh/setup-uv@v6
96+
uses: astral-sh/setup-uv@v7
9797
with:
9898
version: "0.5.21"
9999
enable-cache: true
@@ -104,15 +104,15 @@ jobs:
104104
python-version-file: ".python-version"
105105

106106
- name: Install the project
107-
run: uv sync --all-extras --group docs
107+
run: uv sync --all-extras --group docs --frozen
108108

109109
- name: Configure Git Credentials
110110
run: |
111111
git config user.name github-actions[bot]
112112
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
113113
114114
- name: Download artifact
115-
uses: actions/download-artifact@v5
115+
uses: actions/download-artifact@v6
116116
with:
117117
name: docs-site
118118
path: site

.github/workflows/publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
- uses: actions/[email protected]
1717

1818
- name: Install uv
19-
uses: astral-sh/setup-uv@v6
19+
uses: astral-sh/setup-uv@v7
2020
with:
2121
version: "0.6.6"
2222
enable-cache: true

.github/workflows/unit_tests.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ jobs:
4646
- uses: actions/[email protected]
4747

4848
- name: Install uv
49-
uses: astral-sh/setup-uv@v6
49+
uses: astral-sh/setup-uv@v7
5050
with:
5151
# Install a specific version of uv.
5252
version: "0.5.21"
@@ -58,18 +58,18 @@ jobs:
5858
python-version: ${{ matrix.python-version }}
5959

6060
- name: Install the project
61-
run: uv sync --dev
61+
run: uv sync --dev --prerelease=allow
6262

6363
- name: Install dependencies and check code
6464
run: |
65-
uv run pytest -m "not integration_test" --cov vec_inf --cov-report=xml tests
65+
uv run --frozen pytest -m "not integration_test" --cov vec_inf --cov-report=xml tests
6666
6767
- name: Install the core package only
6868
run: uv sync --no-dev
6969

7070
- name: Run package import tests
7171
run: |
72-
uv run pytest tests/test_imports.py
72+
uv run --frozen pytest tests/test_imports.py
7373
7474
- name: Import Codecov GPG public key
7575
run: |
@@ -79,7 +79,7 @@ jobs:
7979
uses: codecov/[email protected]
8080
with:
8181
token: ${{ secrets.CODECOV_TOKEN }}
82-
file: ./coverage.xml
82+
files: ./coverage.xml
8383
name: codecov-umbrella
8484
fail_ci_if_error: true
8585
verbose: true

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ repos:
1717
- id: check-toml
1818

1919
- repo: https://github.com/astral-sh/ruff-pre-commit
20-
rev: 'v0.13.2'
20+
rev: 'v0.14.4'
2121
hooks:
2222
- id: ruff
2323
args: [--fix, --exit-non-zero-on-fix]

Dockerfile

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,29 +35,33 @@ RUN wget https://bootstrap.pypa.io/get-pip.py && \
3535
rm get-pip.py && \
3636
python3.10 -m pip install --upgrade pip setuptools wheel uv
3737

38-
# Install Infiniband/RDMA support
38+
# Install RDMA support
3939
RUN apt-get update && apt-get install -y \
4040
libibverbs1 libibverbs-dev ibverbs-utils \
4141
librdmacm1 librdmacm-dev rdmacm-utils \
42+
rdma-core ibverbs-providers infiniband-diags perftest \
4243
&& rm -rf /var/lib/apt/lists/*
4344

4445
# Set up RDMA environment (these will persist in the final container)
4546
ENV LD_LIBRARY_PATH="/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH"
46-
ENV UCX_NET_DEVICES=all
4747
ENV NCCL_IB_DISABLE=0
48+
ENV NCCL_SOCKET_IFNAME="^lo,docker0"
49+
ENV NCCL_NET_GDR_LEVEL=PHB
50+
ENV NCCL_IB_TIMEOUT=22
51+
ENV NCCL_IB_RETRY_CNT=7
52+
ENV NCCL_DEBUG=INFO
4853

4954
# Set up project
5055
WORKDIR /vec-inf
5156
COPY . /vec-inf
5257

5358
# Install project dependencies with build requirements
54-
RUN PIP_INDEX_URL="https://download.pytorch.org/whl/cu128" uv pip install --system -e .[dev]
59+
RUN uv pip install --system -e .[dev] --prerelease=allow
5560

56-
# Final configuration
57-
RUN mkdir -p /vec-inf/nccl && \
58-
mv /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1 /vec-inf/nccl/libnccl.so.2.18.1
59-
ENV VLLM_NCCL_SO_PATH=/vec-inf/nccl/libnccl.so.2.18.1
60-
ENV NCCL_DEBUG=INFO
61+
# Install a single, system NCCL (from NVIDIA CUDA repo in base image)
62+
RUN apt-get update && apt-get install -y --allow-change-held-packages\
63+
libnccl2 libnccl-dev \
64+
&& rm -rf /var/lib/apt/lists/*
6165

6266
# Set the default command to start an interactive shell
6367
CMD ["bash"]

MODEL_TRACKING.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ This document tracks all model weights available in the `/model-weights` directo
4040
| `gemma-2b-it` ||
4141
| `gemma-7b` ||
4242
| `gemma-7b-it` ||
43+
| `gemma-2-2b-it` ||
4344
| `gemma-2-9b` ||
4445
| `gemma-2-9b-it` ||
4546
| `gemma-2-27b` ||
@@ -165,8 +166,8 @@ This document tracks all model weights available in the `/model-weights` directo
165166
| Model | Configuration |
166167
|:------|:-------------|
167168
| `Qwen3-14B` ||
168-
| `Qwen3-8B` | |
169-
| `Qwen3-32B` | |
169+
| `Qwen3-8B` | |
170+
| `Qwen3-32B` | |
170171
| `Qwen3-235B-A22B` ||
171172
| `Qwen3-Embedding-8B` ||
172173

@@ -186,6 +187,11 @@ This document tracks all model weights available in the `/model-weights` directo
186187
| `DeepSeek-Coder-V2-Lite-Instruct` ||
187188
| `deepseek-math-7b-instruct` ||
188189

190+
### OpenAI: GPT-OSS
191+
| Model | Configuration |
192+
|:------|:-------------|
193+
| `gpt-oss-120b` ||
194+
189195
### Other LLM Models
190196
| Model | Configuration |
191197
|:------|:-------------|

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
[![code checks](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/code_checks.yml)
88
[![docs](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml/badge.svg)](https://github.com/VectorInstitute/vector-inference/actions/workflows/docs.yml)
99
[![codecov](https://codecov.io/github/VectorInstitute/vector-inference/branch/main/graph/badge.svg?token=NI88QSIGAC)](https://app.codecov.io/github/VectorInstitute/vector-inference/tree/main)
10-
[![vLLM](https://img.shields.io/badge/vLLM-0.10.1.1-blue)](https://docs.vllm.ai/en/v0.10.1.1/)
10+
[![vLLM](https://img.shields.io/badge/vLLM-0.11.0-blue)](https://docs.vllm.ai/en/v0.11.0/)
1111
![GitHub License](https://img.shields.io/github/license/VectorInstitute/vector-inference)
1212

1313
This repository provides an easy-to-use solution to run inference servers on [Slurm](https://slurm.schedmd.com/overview.html)-managed computing clusters using [vLLM](https://docs.vllm.ai/en/latest/). **This package runs natively on the Vector Institute cluster environments**. To adapt to other environments, follow the instructions in [Installation](#installation).
@@ -20,7 +20,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
2020
```bash
2121
pip install vec-inf
2222
```
23-
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
23+
Otherwise, we recommend using the provided [`Dockerfile`](Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
2424

2525
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
2626
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](vec_inf/config/), then install from source by running `pip install .`.
@@ -53,7 +53,7 @@ Models that are already supported by `vec-inf` would be launched using the cache
5353
#### Other commands
5454

5555
* `batch-launch`: Launch multiple model inference servers at once, currently ONLY single node models supported,
56-
* `status`: Check the model status by providing its Slurm job ID.
56+
* `status`: Check the status of all `vec-inf` jobs, or a specific job by providing its job ID.
5757
* `metrics`: Streams performance metrics to the console.
5858
* `shutdown`: Shutdown a model by providing its Slurm job ID.
5959
* `list`: List all available model names, or view the default/cached configuration of a specific model.

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ If you are using the Vector cluster environment, and you don't need any customiz
1212
pip install vec-inf
1313
```
1414

15-
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.10.1.1`.
15+
Otherwise, we recommend using the provided [`Dockerfile`](https://github.com/VectorInstitute/vector-inference/blob/main/Dockerfile) to set up your own environment with the package. The latest image has `vLLM` version `0.11.0`.
1616

1717
If you'd like to use `vec-inf` on your own Slurm cluster, you would need to update the configuration files, there are 3 ways to do it:
1818
* Clone the repository and update the `environment.yaml` and the `models.yaml` file in [`vec_inf/config`](https://github.com/VectorInstitute/vector-inference/blob/main/vec_inf/config), then install from source by running `pip install .`.

0 commit comments

Comments
 (0)