v0.0.6
What's Changed
- Metal: add end-to-end benchmarks by @Maratyszcza in #161
- Metal: simplify and optimize Reponses API adapter by @Maratyszcza in #162
- Metal: fix KV-cache invalidation after reset+append by @Maratyszcza in #163
- Increase max output tokens in Reponses API to 131K by @Maratyszcza in #165
- Remove requirement on maximum Python version by @Maratyszcza in #167
- Move Lemonade to AMD section of
awesome-gpt-oss
by @danielholanda in #164 - Added VLLM Offline Serve working code. by @hrithiksagar-tih in #150
- Metal: indicate threadgroup is a multiple of simdgroup by @Maratyszcza in #168
- Metal: mlock model weights in memory by @Maratyszcza in #170
- Add You.com as tool for browser by @bojanbabic in #171
New Contributors
- @hrithiksagar-tih made their first contribution in #150
- @bojanbabic made their first contribution in #171
Full Changelog: v0.0.5...v0.0.6