Skip to content

Releases: EAddario/llama.cpp

b5139

15 Apr 11:09
84778e9
Compare
Choose a tag to compare
CUDA/HIP: Share the same unified memory allocation logic. (#12934)

Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.

b5137

15 Apr 08:02
daa4228
Compare
Choose a tag to compare
llama : DeepSeek V2/V3 MLA implementation (#12801)

* Merged using squash to remove all noise commit messages

* Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large

* Removed 3 conts (2x RoPE and 1x RMS-norm)

* Changed to use `<cmath>` instead of `<math.h>`

* Reverted removal of the 3 conts

* Used `reshape` in `llm_graph_context::build_attn_mha()`

* Use `k_pe = ggml_reshape`

* Removed the 3 conts again

* Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF

* Removed MQA optimisation from `build_attn_mha()` as no gains now

* Simplified `is_mla` branch in `llm_build_deepseek2()`

* Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls

* Fixed call to `build_attn` in `llm_build_t5_enc`

b5133

14 Apr 22:10
d6d2c2a
Compare
Choose a tag to compare
Add performance print for gemma3 in example (#12929)

b5129

14 Apr 07:29
Compare
Choose a tag to compare
sync : ggml

ggml-ci

b5126

13 Apr 22:30
307bfa2
Compare
Choose a tag to compare
ggml: disable CUDA graphs for unsupported DUP and CONT node types (#1…

b5072

07 Apr 19:27
4ccea21
Compare
Choose a tag to compare
hellaswag: display estimated score confidence interval (#12797)