Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ollama http api support #2176

Closed
2 tasks
wsxiaoys opened this issue May 18, 2024 · 7 comments · Fixed by #2227
Closed
2 tasks

ollama http api support #2176

wsxiaoys opened this issue May 18, 2024 · 7 comments · Fixed by #2227
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@wsxiaoys
Copy link
Member

wsxiaoys commented May 18, 2024

Please describe the feature you want

Following the recent refactoring of the HttpBackend, we now have the capability to support the Ollama HTTP API:

For chat completion, as it is compatible with the OpenAI chat, there is no necessity to reimplement this feature.

Additional context
Add any other context or screenshots about the feature request here.


Please reply with a 👍 if you want this feature.

@wsxiaoys wsxiaoys added enhancement New feature or request good first issue Good for newcomers labels May 18, 2024
@wsxiaoys wsxiaoys changed the title ollama http api support [TAB-648] ollama http api support May 18, 2024
@wsxiaoys wsxiaoys changed the title [TAB-648] ollama http api support ollama http api support May 18, 2024
@SpeedCrash100
Copy link
Contributor

SpeedCrash100 commented May 19, 2024

Hi there,

I wanted to suggest that you consider implementing a new Ollama connector instead of using a generic HTTP API connector. There are a few reasons why I believe this would be a better solution:

  1. Someone(me of course :D) will request to ask ollama to try pull(aka download) model if it doesn't exists, a generic http-api has no reasons to have such an option.
  2. Rust crate availability: There is a readily available ollama-rs crate that can be used to interact with the Ollama API.
  3. Support for async streams: The ollama-rs crate supports async streams, which means that it can be easily integrated with tabby-inference with minimal transformations.

Overall, I think the http api connector tries to be far too generic. Narrowing its scope will make naming clearer and allow Ollama specific features to be implemented without adding unused fields. This is just a suggestion. I don't mind if you or anyone else implements it the way you want.

@SpeedCrash100
Copy link
Contributor

I have created kinda "Proof of concept" implementation here. I don't know what is embedding and how to check it. As for chat completions, they seems working with the config below, completions needs fixes but generates code as well:

[model.chat.ollama]
url = "http://localhost"
port = 11434
model_name = "llama3"


[model.completion.ollama]
url = "http://localhost"
port = 11434
model_name = "starcoder2"
prompt_template = "<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"

[model.embedding.ollama]
url = "http://localhost"
port = 11434
model_name = "llama3"    # I'm not sure what is embedding and what is needed for it.

BTW, why CompletionStream::generate does not return Result<BoxStream<String>> like any other interfaces in crate? This will cause panics in future.

@Wolfsauge
Copy link

It would be nice if you could be a bit more inclusive in your efforts. It looks like OpenAI-compatible APIs provided by other pieces of software are not even considered here. I find this kind of irritating and confusing. It would be nice if you could clarify how ollama support is something "desired" while a general adapter for an OpenAI-compatible API is not? Thank you.

@SpeedCrash100
Copy link
Contributor

SpeedCrash100 commented May 19, 2024

I'm not telling that OpenAI API is not desired - it would good to have it too.

First, my criticism is aimed more at trying to write everything into one connector (http-api) and switch the types of API used via kind or similar fields. I'm not telling about OpenAPI specifically. Perhaps there will be an API that doesn't require specifying the prompt format for FIM models, who knows? Then, there will be some weird config with lots of fields that need to be filled in for one API while others fields must be empty. I don't like it. I would prefer that config for each API will be separate. It would easier to document and use after all.

Second, look, the current version of Tabby can download, run a model and serve it through API. The OpenAI API does not have anything to download step while Ollama provides some model management(list, pull, etc). So providing Ollama API support will allow in future, specify a model and you are ready to go after some preparation steps.

Edit:
Naming is also a problem. If http-api will be used only for OpenAI(As I see, it is not) then it should called OpenAI-api not a http-api.

@wsxiaoys
Copy link
Member Author

wsxiaoys commented May 19, 2024

Hello @SpeedCrash100,

Thank you for your interest. Your implementation appears to be excellent. However, I would still prefer to create it using HttpConfigModel. This approach primarily aims to simplify the configuration of remote backends, since the inputs required by them is almost same.

Adopting this method would not negate the benefits of modularizing crates or depending on ollama-rs (e.g openai-chat). For example, you could still develop a separate ollama-api-binding, but it would take HttpModelConfig as its input and return it in the http_api_binding:create functions. If you are interested in having your implementation merged, please consider adopting this approach, and I would be delighted to review it.

It seems that OpenAI-compatible APIs provided by other software are not even being considered here.

@Wolfsauge, could you please provide more details? OpenAI-compatible chat APIs are already supported for chat-model endpoints. Are there any other API endpoints you would like to see supported?

@Wolfsauge
Copy link

@Wolfsauge, could you please provide more details? OpenAI-compatible chat APIs are already supported for chat-model endpoints. Are there any other API endpoints you would like to see supported?

@wsxiaoys Thank you. Awesome! I realize now: I'm still at 0.10.0 in my level understanding of what's possible with tabbyml. Thank you for implementing this and thanks again for telling me. It's looking exactly like what I was missing so dearly. I will do my homework, figure out how to use it and then, if I would still have any questions coming up I would open a different issue.

@SpeedCrash100 Aha, interesting! Indeed, I was completely neglecting these model management aspects in my considerations. And it's true, model management isn't well represented in those "OpenAI-compatible" API implementations I know of and have used, if at all. Also, what I have seen as an implementer's nightmare is the fact that different "OpenAI-compatible" implementations in different products are different enough to require abstraction in the client to be able to reliably talk to different flavors of these. This is a kind of API-nightmare an ollama user will never have to endure and which an ollama integration will never cause, as it is coming from one origin.

That's why I am thinking you guys are making the right decisions. Please keep up the good work! Thanks again, for helping me out on this and also for complementing my lack of understanding. <3

@wsxiaoys
Copy link
Member Author

another PR could be used as a reference when creating similar HTTP adapters: https://github.com/TabbyML/tabby/pull/2224/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants