Releases: EAddario/llama.cpp
Releases · EAddario/llama.cpp
b5995
musa: fix build warnings (unused variable) (#14869) Signed-off-by: Xiaodong Ye <[email protected]>
b5994
ggml-cpu : disable GGML_NNPA by default due to instability (#14880) * docs: update s390x document for sentencepiece Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d) * docs: update huggingface links + reword Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108) * ggml-cpu: disable ggml-nnpa compile flag by default fixes #14877 Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09) * docs: update s390x build docs to reflect nnpa disable Signed-off-by: Aaron Teo <[email protected]> (cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d) --------- Signed-off-by: Aaron Teo <[email protected]>
b5971
ci : correct label refactor->refactoring (#14832)
b5964
ggml : model card yaml tab->2xspace (#14819)
b5942
imatrix : use GGUF to store importance matrices (#9400) * imatrix : allow processing multiple chunks per batch * perplexity : simplify filling the batch * imatrix : fix segfault when using a single chunk per batch * imatrix : use GGUF to store imatrix data * imatrix : fix conversion problems * imatrix : use FMA and sort tensor names * py : add requirements for legacy imatrix convert script * perplexity : revert changes * py : include imatrix converter requirements in toplevel requirements * imatrix : avoid using designated initializers in C++ * imatrix : remove unused n_entries * imatrix : allow loading mis-ordered tensors Sums and counts tensors no longer need to be consecutive. * imatrix : more sanity checks when loading multiple imatrix files * imatrix : use ggml_format_name instead of std::string concatenation Co-authored-by: Xuan Son Nguyen <[email protected]> * quantize : use unused imatrix chunk_size with LLAMA_TRACE * common : use GGUF for imatrix output by default * imatrix : two-way conversion between old format and GGUF * convert : remove imatrix to gguf python script * imatrix : use the function name in more error messages * imatrix : don't use FMA explicitly This should make comparisons between the formats easier because this matches the behavior of the previous version. * imatrix : avoid returning from void function save_imatrix * imatrix : support 3d tensors with MUL_MAT * quantize : fix dataset name loading from gguf imatrix * common : move string_remove_suffix from quantize and imatrix Co-authored-by: Sigbjørn Skjæret <[email protected]> * imatrix : add warning when legacy format is written * imatrix : warn when writing partial data, to help guess dataset coverage Also make the legacy format store partial data by using neutral values for missing data. This matches what is done at read-time for the new format, and so should get the same quality in case the old format is still used. * imatrix : avoid loading model to convert or combine imatrix * imatrix : avoid using imatrix.dat in README --------- Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
b5939
Documentation: Update build.md's Vulkan section (#14736) * Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation. * Documentation: Reorganize build.md's Vulkan section.
b5921
llama : fix parallel processing for lfm2 (#14705)
b5906
convert : only check for tokenizer folder if we need it (#14704)
b5890
quantize : fix minor logic flaw in --tensor-type (#14572)
b5873
model : support LiquidAI LFM2 hybrid family (#14620) **Important** LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340)into transformers, but has not yet been released. To convert into gguf, install transformers from source ```shell pip install "transformers @ git+https://github.com/huggingface/transformers.git@main" ```