Add support for loading model from in-memory buffer #7560

jiwonjung42dotai · 2024-05-27T10:04:28Z

jiwonjung42dotai
May 27, 2024

Current LlamaCpp comes with support of loading model file from absolute path.

struct llama_model * llama_load_model_from_file(
        const char * path_model,
        struct llama_model_params   params)

However, don't you have any plan to add API to load llama model from buffer pointer already reside in memory? With such support, we may use pointer for sake of memory mapped i/o or something else. It would cover more cases of dealing with more scenarios.

ggerganov · 2024-05-30T12:03:24Z

ggerganov
May 30, 2024
Maintainer

If I'm not mistaken, there should be an option to make a memory buffer "appear" as a virtual file. If you can do that, then you should be able to reuse llama_load_model_from_file() to load directly from memory. I could be wrong though - I have some vague memory of looking up something along those lines and thought it could work

1 reply

akawarren Jun 4, 2024

I agree with that. However, even we use gnu functions like memfd_create(), it will involve at least one step to copy original buffer to virtual file pointer. And using current llama_load_model_from_file() will read buffer from virtual location once again, which is somewhat redundant in this case.

Creating virtual file in memory has a benefit of reusing current llama_load_model_from_file() API, however, following copies will be needed.

gartia · 2024-06-06T02:36:28Z

gartia
Jun 6, 2024

Ran into this discussion after trying my hand at writing some functions. The code base for ggml is pretty massive, and Rider does not want to give me intellisense with it so i might of made some duplicate functions. My c++ knowledge mainly comes from unrealengine so this isn't the easiest transition lol, and i have touched c a handful of times.
https://github.com/gartia/llama.cpp-load_from_buffer

From my testing my changes to llama_load_model_from_file do not seem to affect performance or break anything.
llama_model_loader seems to be pretty intwined with file paths so i'm still trying to think of how to tackle that.
gguf_init_from_buffer could be better implemented to have less duplicate code with gguf_init_from_file, but i am just trying to get something working for right now. From my testing it seems to work fine

On the note of all of this. I think having an inference directly from a buffer can be nice in systems where accessing the file system is either a hassle, frowned upon, or just not possible. For having a virtual file the previous issues still arise.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for loading model from in-memory buffer #7560

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Add support for loading model from in-memory buffer #7560

Uh oh!

Uh oh!

jiwonjung42dotai May 27, 2024

Replies: 2 comments · 1 reply

Uh oh!

ggerganov May 30, 2024 Maintainer

Uh oh!

akawarren Jun 4, 2024

Uh oh!

gartia Jun 6, 2024

jiwonjung42dotai
May 27, 2024

Replies: 2 comments 1 reply

ggerganov
May 30, 2024
Maintainer

gartia
Jun 6, 2024