Skip to content

Releases: jd-opensource/xllm

v0.6.0

15 Sep 14:31
Compare
Choose a tag to compare

Highlights

Model Support

  • Support DeepSeek-V3/R1.
  • Support DeepSeek-R1-Distill-Qwen.
  • Support Kimi-k2.
  • Support Llama2/3.
  • Support Qwen2/2.5/QwQ.
  • Support Qwen3/Qwen3-MoE.
  • Support MiniCPM-V.
  • Support MiMo-VL.
  • Support Qwen2.5-VL .

Feature

  • Support KV cache store.
  • Support Expert Parallelism Load Balance.
  • Support multi-priority on/offline scheduler.
  • Support latency-aware scheduler.
  • Support serving early stop.
  • Optimize ppmatmul kernel.
  • Support image url input for VLM.
  • Support disaggregated prefill and decoding.
  • Support large-scale EP parallelism.
  • Support Hash-based PrefixCache matching.
  • Support Multi-Token Prediction for DeepSeek.
  • Support asynchronous scheduling, allowing the scheduling and computational pipeline to execute in parallel.
  • Support EP, DP, TP model parallel.
  • Support multiple process and multiple nodes.

Docs

  • Add getting started docs.
  • Add features docs.

Release Images

x86 image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hb-rc2-py3.11-oe24.03-lts-x86

ARM a2 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hb-rc2-py3.11-oe24.03-lts-arm

ARM a3 device image

quay.io/jd_xllm/xllm-ai:xllm-0.6.0-release-hc-rc2-py3.11-oe24.03-lts-arm