Replies: 2 comments 2 replies
-
To clarify, you mean adding the field here in the Modelfile generator? |
Beta Was this translation helpful? Give feedback.
1 reply
-
Yeah, if possible to add the num_gpu. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Lately I've run into models that are having issues loading into Ollama, they usually end up with "out of memory". For example 7b models like Gemma using 3070ti.
The solution to load 7b models into ollama is to restrict the number of layers which pytorch is loading into the gpu,
this is done by using a flag
PARAMETER num_gpu 25
in the Modelfile.which will look like this when ollama runs the model in log:
llm_load_tensors: offloading 25 repeating layers to GPU
As a feature request, I'd like to see a "Set layers for model X" in Open-WebUI which then creates a new model file, uses the ollama create flag and have it all listed. this will help benchmark older cards with less gpu than just rely on the cpu with
num_gpu 0
flag.Beta Was this translation helpful? Give feedback.
All reactions