Replies: 2 comments
-
In general, if you have only one GPU, you can only load one model at a time into the GPU. The only exception to this is if you have a GPU that supports bifurcation which allows you to run more than one model on a single GPU. Therefore, if you are running prompts on different models connected to a single GPU, Ollama will process prompts in a serial "first in, first out" manner, loading the model that is requested by the prompt. If you want to run prompts on two models concurrently, you will have to get a second GPU. Also note that Ollama is always "FIFO". So if you have a user that is issuing prompts with a large context, then the response time will be driven by the longest prompt. |
Beta Was this translation helpful? Give feedback.
-
Multiple users have applied for accounts, but they need to be approved by the administrator before they can have permission to use web server. How can the administrator proceed to approve the application? |
Beta Was this translation helpful? Give feedback.
-
Bug Report
Description
Bug Summary:
I hosted ollama and open-webui on a server and it is working fine. When I and my colleague tried to query something at the same time with different models, one had to wait until the model generation/result is complete for the other to start responding.
Steps to Reproduce:
We used Terraform and Ansible to deploy both ollama and open-webui on our server. They can both communicate and are working fine.
Trying to use open-webui simultaneously/parallely doesn't seem to work. We want to expand this service to everyone in our group to access it.
Expected Behavior:
Two users can simultaneously query/use/chat at the same time without any delays.
Actual Behavior:
One query has to be finished until the other one starts.
Environment
Reproduction Details
Confirmation:
Logs and Screenshots
Browser Console Logs:
[Include relevant browser console logs, if applicable]
Docker Container Logs:
[Include relevant Docker container logs, if applicable]
Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]
Installation Method
I'm using docker-compose.yaml to deploy
Additional Information
[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions