Skip to content

Releases: EAddario/llama.cpp

b6323

30 Aug 10:13
696fccf
Compare
Choose a tag to compare
vulkan: Skip syncing for prealloc_y when it is reused (#15544)

b6294

26 Aug 21:31
bcbddcd
Compare
Choose a tag to compare
tests : fix test-opt with GGML_BACKEND_DL (#15599)

b6275

25 Aug 18:36
4d917cd
Compare
Choose a tag to compare
vulkan: fix min subgroup 16 condition for mmid subgroup optimization …

b6264

24 Aug 20:32
043fb27
Compare
Choose a tag to compare
vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices…

b6239

21 Aug 19:07
cd36b5e
Compare
Choose a tag to compare
llama : remove deprecated llama_kv_self API (#15472)

ggml-ci

b6209

19 Aug 23:37
fb22dd0
Compare
Choose a tag to compare
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…

b6190

18 Aug 07:40
ae532ea
Compare
Choose a tag to compare
vulkan: disable spirv-opt for bfloat16 shaders (#15352)

b6178

15 Aug 20:25
5e6229a
Compare
Choose a tag to compare
common : fix double bos, use common_chat_templates for add_bos and ad…

b6123

10 Aug 12:33
79c1160
Compare
Choose a tag to compare
cuda: refactored ssm_scan and use CUB (#13291)

* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning

b6121

09 Aug 01:28
e54d41b
Compare
Choose a tag to compare
gguf-py : add Numpy MXFP4 de/quantization support (#15111)

* gguf-py : add MXFP4 de/quantization support

* ggml-quants : handle zero amax for MXFP4