-
Couldn't load subscription status.
- Fork 1.2k
Open
Description
Hi,
When using ollama and passing in "keep_alive" as a "language_model_params", the model is loaded with the default keep_alive of 5 minutes.
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
examples=examples,
language_model_type=lx.inference.OllamaLanguageModel,
model_id="qwen2.5:14b",
model_url=os.getenv("OLLAMA_HOST", "http://localhost:11434"),
temperature=0.3,
fence_output=False,
use_schema_constraints=False,
max_char_buffer=5000,
language_model_params={
"num_ctx": 8192,
"keep_alive": 10*60, # 10 minutes
"timeout": 10*60 # 10 minutes
}
)
You can run the following to verify (assuming the model wasn't in memory already), it will be loaded for 5 minutes.
ollama ps
In the Ollama.py file, it looks like keep_alive is put under the "options" parameter, but the Ollama API documentation shows that it is one of the top level parameters so the payload should be:
payload: dict[str, Any] = {
'model': model,
'prompt': prompt,
'system': system,
'stream': False,
'raw': raw,
'keep_alive': keep_alive,
'options': options,
}
chinmaynadgir
Metadata
Metadata
Assignees
Labels
No labels