Releases: EAddario/llama.cpp
Releases · EAddario/llama.cpp
b6323
vulkan: Skip syncing for prealloc_y when it is reused (#15544)
b6294
tests : fix test-opt with GGML_BACKEND_DL (#15599)
b6275
vulkan: fix min subgroup 16 condition for mmid subgroup optimization …
b6264
vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices…
b6239
llama : remove deprecated llama_kv_self API (#15472) ggml-ci
b6209
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…
b6190
vulkan: disable spirv-opt for bfloat16 shaders (#15352)
b6178
common : fix double bos, use common_chat_templates for add_bos and ad…
b6123
cuda: refactored ssm_scan and use CUB (#13291) * cuda: refactored ssm_scan to use CUB * fixed compilation error when when not using CUB * assign L to constant and use size_t instead of int * deduplicated functions * change min blocks per mp to 1 * Use cub load and store warp transpose * suppress clang warning
b6121
gguf-py : add Numpy MXFP4 de/quantization support (#15111) * gguf-py : add MXFP4 de/quantization support * ggml-quants : handle zero amax for MXFP4