Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipex-llm older ollama serve hangs after 5 minutes Intel Arc GPU 770 #10800

Open
shailesh837 opened this issue Apr 18, 2024 · 1 comment
Open

Comments

@shailesh837
Copy link

I have installed older bigdl so and it hanged with 5 minutes or running ollama serve , I was running llama2:7b model.

Bigdl version on machine:
bigdl-core-xe-21==2.5.0b20240321
bigdl-core-xe-esimd-21==2.5.0b20240321
bigdl-llm==2.5.0b20240321
intel-extension-for-pytorch==2.1.10+xpu

So I create systemd service file to restrat the ollama serve every 5 minutes , which is not good for latency.

@sgwhat
Copy link
Contributor

sgwhat commented Apr 19, 2024

Reason

Ollama will unload model from gpu memory in every 5 minutes as default.

Solution

  1. For latest version of ollama, you could set export OLLAMA_KEEP_ALIVE=-1 to keep the model loaded in memory.
  2. For the older version ollama, please install the latest version of langchian.community and add keep_alive=-1 in chatollama, for example: llm = chatollama(model="llama2:latest", keep_alive=-1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants