Releases · jd-opensource/xllm · GitHub

15 Sep 14:31

yq33victor

v0.6.0 Latest

Latest

Highlights

Model Support

Support DeepSeek-V3/R1.
Support DeepSeek-R1-Distill-Qwen.
Support Kimi-k2.
Support Llama2/3.
Support Qwen2/2.5/QwQ.
Support Qwen3/Qwen3-MoE.
Support MiniCPM-V.
Support MiMo-VL.
Support Qwen2.5-VL .

Feature

Support KV cache store.
Support Expert Parallelism Load Balance.
Support multi-priority on/offline scheduler.
Support latency-aware scheduler.
Support serving early stop.
Optimize ppmatmul kernel.
Support image url input for VLM.
Support disaggregated prefill and decoding.
Support large-scale EP parallelism.
Support Hash-based PrefixCache matching.
Support Multi-Token Prediction for DeepSeek.
Support asynchronous scheduling, allowing the scheduling and computational pipeline to execute in parallel.
Support EP, DP, TP model parallel.
Support multiple process and multiple nodes.

Docs

Add getting started docs.
Add features docs.

Release Images

x86 image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hb-rc2-py3.11-oe24.03-lts-x86

ARM a2 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hb-rc2-py3.11-oe24.03-lts-arm

ARM a3 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hc-rc2-py3.11-oe24.03-lts-arm

Assets 2