Releases · EAddario/llama.cpp

24 May 22:21

17fc817

b5476

releases : enable openmp in windows cpu backend build (#13756)

Assets 18

14 May 08:06

github-actions

b5373

be1d4a1

b5373

scripts : fix compare-llama-bench.py show parameter (#13514)

Assets 20

11 May 08:45

github-actions

b5343

62d4250

b5343

docs : Fix typo in InternVL3 model name (#13440)

Assets 20

03 May 07:07

github-actions

b5269

1d36b36

b5269

llama : move end-user examples to tools directory (#13249)

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 26

29 Apr 07:15

github-actions

b5215

5f5e39e

b5215

model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architectur…

Assets 26

27 Apr 22:56

github-actions

b5200

c0a97b7

b5200

llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

Assets 26

25 Apr 21:46

github-actions

b5191

295354e

b5191

llama : fix K-shift with quantized K and BLAS backend (#13113)

Assets 26

19 Apr 11:34

github-actions

b5156

37b9f0d

b5156

clip : refactor, add `image_manipulation` and `llava_uhd` classes (#1…

Assets 26

17 Apr 09:42

github-actions

b5146

971f245

b5146

llama : recognize IBM Granite 3.3 FIM tokens (#12988)

The Granite's FIM tokens are very similar to Qwen's; it's just that
they use underscore instead of a dash. So <fim_middle> for example
instead of <fim-middle>.

Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base
shows:

```
    "<fim_prefix>",
    "<fim_middle>",
    "<fim_suffix>",
    "<fim_pad>",
    ...
    "<reponame>",
```

Assets 26

16 Apr 07:43

github-actions

b5142

80f19b4

b5142

opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)

* opencl: refactor - split the kernel files

---------

Co-authored-by: Shangqing Gu <[email protected]>

* opencl: split more kernels into separate files

* opencl: specify subgroup size instead of querying it

* opencl: refine Adreno cl compiler version parsing

* opencl: skip some kernels not used by Adreno on old compilers

* opencl: refine logic for selecting Adreno kernels

* opencl: refine Adreno cl compiler version

* opencl: cleanup preprocessor for kernels

* opencl: consider Adreno CL compiler on Windows

* opencl: add final newline for `mul_mv_f16_f16.cl`

---------

Co-authored-by: Shangqing Gu <[email protected]>

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: EAddario/llama.cpp

b5476

Uh oh!

b5373

Uh oh!

b5343

Uh oh!

b5269

Uh oh!

b5215

Uh oh!

b5200

Uh oh!

b5191

Uh oh!

b5156

Uh oh!

b5146

Uh oh!

b5142

Uh oh!