Releases · EAddario/llama.cpp

30 Aug 10:13

696fccf

b6323

vulkan: Skip syncing for prealloc_y when it is reused (#15544)

Assets 15

26 Aug 21:31

github-actions

b6294

bcbddcd

b6294

tests : fix test-opt with GGML_BACKEND_DL (#15599)

Assets 15

25 Aug 18:36

github-actions

b6275

4d917cd

b6275

vulkan: fix min subgroup 16 condition for mmid subgroup optimization …

Assets 15

24 Aug 20:32

github-actions

b6264

043fb27

b6264

vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices…

Assets 15

21 Aug 19:07

github-actions

b6239

cd36b5e

b6239

llama : remove deprecated llama_kv_self API (#15472)

ggml-ci

Assets 15

19 Aug 23:37

github-actions

b6209

fb22dd0

b6209

opencl: mark `argsort` unsupported if cols exceed workgroup limit (#1…

Assets 15

18 Aug 07:40

github-actions

b6190

ae532ea

b6190

vulkan: disable spirv-opt for bfloat16 shaders (#15352)

Assets 15

15 Aug 20:25

github-actions

b6178

5e6229a

b6178

common : fix double bos, use common_chat_templates for add_bos and ad…

Assets 15

10 Aug 12:33

github-actions

b6123

79c1160

b6123

cuda: refactored ssm_scan and use CUB (#13291)

* cuda: refactored ssm_scan to use CUB

* fixed compilation error when when not using CUB

* assign L to constant and use size_t instead of int

* deduplicated functions

* change min blocks per mp to 1

* Use cub load and store warp transpose

* suppress clang warning

Assets 15

09 Aug 01:28

github-actions

b6121

e54d41b

b6121

gguf-py : add Numpy MXFP4 de/quantization support (#15111)

* gguf-py : add MXFP4 de/quantization support

* ggml-quants : handle zero amax for MXFP4

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: EAddario/llama.cpp

b6323

Uh oh!

b6294

Uh oh!

b6275

Uh oh!

b6264

Uh oh!

b6239

Uh oh!

b6209

Uh oh!

b6190

Uh oh!

b6178

Uh oh!

b6123

Uh oh!

b6121

Uh oh!