-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Closed
Labels
releaseRelated to new version releaseRelated to new version release
Description
ETA: Feb 29th - Mar 1st
Major changes
- StarCoder2 support
- Performance optimization and LoRA support for Gemma
- Performance optimization for MoE kernel
- 2/3/8-bit GPTQ support
- [Experimental] AWS Inferentia2 support
PRs to be merged before the release
- Add Support for 2/3/8-bit GPTQ Quantization Models #2330 Add GPTQ quantization kernels for 2, 3, 8-bit use cases #2223
-
GPTQ & AWQ Fused MOE #2761 - Add guided decoding for OpenAI API server #2819
- Use of
logits_processors
has become very slow in v0.3.2 #3087 [Fix] Don't deep-copy LogitsProcessors when copying SamplingParams #3099 - Support starcoder2 architecture #3089
gsnaws, ywang96, DreamTeamWangbowen, Xu-Chen, saattrupdan and 3 more
Metadata
Metadata
Assignees
Labels
releaseRelated to new version releaseRelated to new version release