Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ollama server #3647

Open
skonto opened this issue Apr 29, 2024 · 2 comments
Open

Support ollama server #3647

skonto opened this issue Apr 29, 2024 · 2 comments

Comments

@skonto
Copy link
Contributor

skonto commented Apr 29, 2024

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Support the ollama server as a runtime (not sure if it has been asked elsewhere).

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Ollama seems pretty fast in terms of cpu usage (also 4bits out of the box), or a good compromise between cpu/gpu. Vllm has some restrictions at the moment eg. avx512 prerequisite. It would be good to have options especially for running models locally (eg. dev phase) or more importantly on non-gpu clusters I think.

Links to the design documents:
N/A

@skonto
Copy link
Contributor Author

skonto commented Apr 29, 2024

cc @terrytangyuan @yuzisun wdyth?

@nilayaishwarya
Copy link

@skonto I am working on one right now. A rather simple one utilizing ollama client. Let me know if you have any suggestions for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants