You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have installed older bigdl so and it hanged with 5 minutes or running ollama serve , I was running llama2:7b model.
Bigdl version on machine:
bigdl-core-xe-21==2.5.0b20240321
bigdl-core-xe-esimd-21==2.5.0b20240321
bigdl-llm==2.5.0b20240321
intel-extension-for-pytorch==2.1.10+xpu
So I create systemd service file to restrat the ollama serve every 5 minutes , which is not good for latency.
The text was updated successfully, but these errors were encountered:
Ollama will unload model from gpu memory in every 5 minutes as default.
Solution
For latest version of ollama, you could set export OLLAMA_KEEP_ALIVE=-1 to keep the model loaded in memory.
For the older version ollama, please install the latest version of langchian.community and add keep_alive=-1 in chatollama, for example: llm = chatollama(model="llama2:latest", keep_alive=-1)
I have installed older bigdl so and it hanged with 5 minutes or running ollama serve , I was running llama2:7b model.
Bigdl version on machine:
bigdl-core-xe-21==2.5.0b20240321
bigdl-core-xe-esimd-21==2.5.0b20240321
bigdl-llm==2.5.0b20240321
intel-extension-for-pytorch==2.1.10+xpu
So I create systemd service file to restrat the ollama serve every 5 minutes , which is not good for latency.
The text was updated successfully, but these errors were encountered: