Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770 #10800

shailesh837 · 2024-04-18T21:27:02Z

I have installed older bigdl so and it hanged with 5 minutes or running ollama serve , I was running llama2:7b model.

Bigdl version on machine:
bigdl-core-xe-21==2.5.0b20240321
bigdl-core-xe-esimd-21==2.5.0b20240321
bigdl-llm==2.5.0b20240321
intel-extension-for-pytorch==2.1.10+xpu

So I create systemd service file to restrat the ollama serve every 5 minutes , which is not good for latency.

sgwhat · 2024-04-19T01:29:41Z

Reason

Ollama will unload model from gpu memory in every 5 minutes as default.

Solution

For latest version of ollama, you could set export OLLAMA_KEEP_ALIVE=-1 to keep the model loaded in memory.
For the older version ollama, please install the latest version of langchian.community and add keep_alive=-1 in chatollama, for example: llm = chatollama(model="llama2:latest", keep_alive=-1)

sgwhat added the user issue label Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770 #10800

Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770 #10800

shailesh837 commented Apr 18, 2024

sgwhat commented Apr 19, 2024

Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770 #10800

Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770 #10800

Comments

shailesh837 commented Apr 18, 2024

sgwhat commented Apr 19, 2024

Reason

Solution