Skip to content

Releases: EAddario/llama.cpp

b6686

03 Oct 22:05
128d522
Compare
Choose a tag to compare
chat : support Magistral thinking (#16413)

* feat: added a dedicated Magistral chat format that preserves [THINK] spans, parses reasoning before tool calls

* feat: new flow in the chat template test suite for Magistral

b6683

03 Oct 15:00
946f71e
Compare
Choose a tag to compare
llama : fix shapes for bert/mpt q/k norm (#16409)

b6679

03 Oct 11:05
0e1f838
Compare
Choose a tag to compare
vulkan: Fix FA coopmat1 invalid array indexing (#16365)

When computing sinks, the cm1 shader was looping r from 0 to Br rather than
to rows_per_thread. I must have copied this from the scalar path (where it is
correct), and somehow it wasn't causing failures on current drivers.

b6660

01 Oct 18:28
4201dea
Compare
Choose a tag to compare
common: introduce http.h for httplib-based client (#16373)

* common: introduce http.h for httplib-based client

This change moves cpp-httplib based URL parsing and client setup into
a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`.

It is an iteration towards removing libcurl, while intentionally
minimizing changes to existing code to guarantee the same behavior when
`LLAMA_CURL` is used.

Signed-off-by: Adrien Gallouët <[email protected]>

* tools : add missing WIN32_LEAN_AND_MEAN

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>
Signed-off-by: Adrien Gallouët <[email protected]>

b6658

01 Oct 17:05
2a9b633
Compare
Choose a tag to compare
Improve code block color theming (#16325)

* feat: Improve code block theming

* chore: update webui build output

* chore: Update webui static build

b6527

20 Sep 21:43
Compare
Choose a tag to compare
sync : ggml

b6519

19 Sep 08:09
4b8560a
Compare
Choose a tag to compare
chat : fix build on arm64 (#16101)

b6475

15 Sep 07:39
b8e09f0
Compare
Choose a tag to compare
model : add grok-2 support (#15539)

* add grok-2 support

* type fix

* type fix

* type fix

* "fix" vocab for invalid sequences

* fix expert tensor mapping and spaces in vocab

* add chat template

* fix norm tensor mapping

* rename layer_out_norm to ffn_post_norm

* ensure ffn_post_norm is mapped

* fix experts merging

* remove erroneous FFN_GATE entry

* concatenate split tensors and add more metadata

* process all expert layers and try cat instead of hstack

* add support for community BPE vocab

* fix expert feed forward length and ffn_down concat

* commit this too

* add ffn_up/gate/down, unsure if sequence is right

* add ffn_gate/down/up to tensor names

* correct residual moe (still not working)

* mess--

* fix embedding scale being applied twice

* add built in chat template

* change beta fast for grok if default value

* remove spm vocab in favor of community bpe vocab

* change attention temp length metadata type to integer

* update attention temp length metadata

* remove comment

* replace M_SQRT2 with std::sqrt(2)

* add yarn metadata, move defaults to hparams

b6445

10 Sep 21:13
00681df
Compare
Choose a tag to compare
CUDA: Add `fastdiv` to `k_bin_bcast*`, giving 1-3% E2E performance (#…

b6399

06 Sep 12:53
61bdfd5
Compare
Choose a tag to compare
server : implement prompt processing progress report in stream mode (…