ollama http api support #2176

wsxiaoys · 2024-05-18T01:15:25Z

Please describe the feature you want

Following the recent refactoring of the HttpBackend, we now have the capability to support the Ollama HTTP API:

Completion. For instance, see: https://github.com/TabbyML/tabby/blob/main/crates/http-api-bindings/src/completion/llama.rs
Embedding. For example, refer to: https://github.com/TabbyML/tabby/blob/main/crates/http-api-bindings/src/embedding/llama.rs

For chat completion, as it is compatible with the OpenAI chat, there is no necessity to reimplement this feature.

Additional context
Add any other context or screenshots about the feature request here.

Please reply with a 👍 if you want this feature.

SpeedCrash100 · 2024-05-19T08:13:01Z

Hi there,

I wanted to suggest that you consider implementing a new Ollama connector instead of using a generic HTTP API connector. There are a few reasons why I believe this would be a better solution:

Someone(me of course :D) will request to ask ollama to try pull(aka download) model if it doesn't exists, a generic http-api has no reasons to have such an option.
Rust crate availability: There is a readily available ollama-rs crate that can be used to interact with the Ollama API.
Support for async streams: The ollama-rs crate supports async streams, which means that it can be easily integrated with tabby-inference with minimal transformations.

Overall, I think the http api connector tries to be far too generic. Narrowing its scope will make naming clearer and allow Ollama specific features to be implemented without adding unused fields. This is just a suggestion. I don't mind if you or anyone else implements it the way you want.

SpeedCrash100 · 2024-05-19T13:16:11Z

I have created kinda "Proof of concept" implementation here. I don't know what is embedding and how to check it. As for chat completions, they seems working with the config below, completions needs fixes but generates code as well:

[model.chat.ollama]
url = "http://localhost"
port = 11434
model_name = "llama3"


[model.completion.ollama]
url = "http://localhost"
port = 11434
model_name = "starcoder2"
prompt_template = "<fim_prefix>{prefix}<fim_suffix>{suffix}<fim_middle>"

[model.embedding.ollama]
url = "http://localhost"
port = 11434
model_name = "llama3"    # I'm not sure what is embedding and what is needed for it.

BTW, why CompletionStream::generate does not return Result<BoxStream<String>> like any other interfaces in crate? This will cause panics in future.

Wolfsauge · 2024-05-19T17:20:15Z

It would be nice if you could be a bit more inclusive in your efforts. It looks like OpenAI-compatible APIs provided by other pieces of software are not even considered here. I find this kind of irritating and confusing. It would be nice if you could clarify how ollama support is something "desired" while a general adapter for an OpenAI-compatible API is not? Thank you.

SpeedCrash100 · 2024-05-19T18:24:14Z

I'm not telling that OpenAI API is not desired - it would good to have it too.

First, my criticism is aimed more at trying to write everything into one connector (http-api) and switch the types of API used via kind or similar fields. I'm not telling about OpenAPI specifically. Perhaps there will be an API that doesn't require specifying the prompt format for FIM models, who knows? Then, there will be some weird config with lots of fields that need to be filled in for one API while others fields must be empty. I don't like it. I would prefer that config for each API will be separate. It would easier to document and use after all.

Second, look, the current version of Tabby can download, run a model and serve it through API. The OpenAI API does not have anything to download step while Ollama provides some model management(list, pull, etc). So providing Ollama API support will allow in future, specify a model and you are ready to go after some preparation steps.

Edit:
Naming is also a problem. If http-api will be used only for OpenAI(As I see, it is not) then it should called OpenAI-api not a http-api.

wsxiaoys · 2024-05-19T21:13:19Z

Hello @SpeedCrash100,

Thank you for your interest. Your implementation appears to be excellent. However, I would still prefer to create it using HttpConfigModel. This approach primarily aims to simplify the configuration of remote backends, since the inputs required by them is almost same.

Adopting this method would not negate the benefits of modularizing crates or depending on ollama-rs (e.g openai-chat). For example, you could still develop a separate ollama-api-binding, but it would take HttpModelConfig as its input and return it in the http_api_binding:create functions. If you are interested in having your implementation merged, please consider adopting this approach, and I would be delighted to review it.

It seems that OpenAI-compatible APIs provided by other software are not even being considered here.

@Wolfsauge, could you please provide more details? OpenAI-compatible chat APIs are already supported for chat-model endpoints. Are there any other API endpoints you would like to see supported?

Wolfsauge · 2024-05-20T19:38:56Z

@Wolfsauge, could you please provide more details? OpenAI-compatible chat APIs are already supported for chat-model endpoints. Are there any other API endpoints you would like to see supported?

@wsxiaoys Thank you. Awesome! I realize now: I'm still at 0.10.0 in my level understanding of what's possible with tabbyml. Thank you for implementing this and thanks again for telling me. It's looking exactly like what I was missing so dearly. I will do my homework, figure out how to use it and then, if I would still have any questions coming up I would open a different issue.

@SpeedCrash100 Aha, interesting! Indeed, I was completely neglecting these model management aspects in my considerations. And it's true, model management isn't well represented in those "OpenAI-compatible" API implementations I know of and have used, if at all. Also, what I have seen as an implementer's nightmare is the fact that different "OpenAI-compatible" implementations in different products are different enough to require abstraction in the client to be able to reliably talk to different flavors of these. This is a kind of API-nightmare an ollama user will never have to endure and which an ollama integration will never cause, as it is coming from one origin.

That's why I am thinking you guys are making the right decisions. Please keep up the good work! Thanks again, for helping me out on this and also for complementing my lack of understanding. <3

wsxiaoys · 2024-05-22T21:03:12Z

another PR could be used as a reference when creating similar HTTP adapters: https://github.com/TabbyML/tabby/pull/2224/files

wsxiaoys added enhancement New feature or request good first issue Good for newcomers labels May 18, 2024

wsxiaoys changed the title ~~ollama http api support~~ [TAB-648] ollama http api support May 18, 2024

wsxiaoys changed the title ~~[TAB-648] ollama http api support~~ ollama http api support May 18, 2024

Wolfsauge mentioned this issue May 19, 2024

Enable adapters for OpenAI Models #1664

Open

SpeedCrash100 mentioned this issue May 23, 2024

feat: Added ollama api connection options #2227

Merged

4 tasks

wsxiaoys closed this as completed in #2227 May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama http api support #2176

ollama http api support #2176

wsxiaoys commented May 18, 2024 •

edited

SpeedCrash100 commented May 19, 2024 •

edited

SpeedCrash100 commented May 19, 2024

Wolfsauge commented May 19, 2024

SpeedCrash100 commented May 19, 2024 •

edited

wsxiaoys commented May 19, 2024 •

edited

Wolfsauge commented May 20, 2024

wsxiaoys commented May 22, 2024

ollama http api support #2176

ollama http api support #2176

Comments

wsxiaoys commented May 18, 2024 • edited

SpeedCrash100 commented May 19, 2024 • edited

SpeedCrash100 commented May 19, 2024

Wolfsauge commented May 19, 2024

SpeedCrash100 commented May 19, 2024 • edited

wsxiaoys commented May 19, 2024 • edited

Wolfsauge commented May 20, 2024

wsxiaoys commented May 22, 2024

wsxiaoys commented May 18, 2024 •

edited

SpeedCrash100 commented May 19, 2024 •

edited

SpeedCrash100 commented May 19, 2024 •

edited

wsxiaoys commented May 19, 2024 •

edited