Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Configurable OpenAI timeouts and retry settings for compatible APIs #301

Open
esatapedico opened this issue Jan 6, 2024 · 1 comment

Comments

@esatapedico
Copy link

Is your feature request related to a problem? Please describe.
I'm using LocalAI as OpenAI-compatible API for self-hosted LLM models. I've configured its endpoint for Zep to use it as OpenAI-compatible API for summarization, intent and entity extraction.

My local server is however not that beefy and requests to it can take up to several minutes to complete. Then when Zep starts calling my API for the tasks at hand, requests start timing out, and then retrying kicks in. Not only responses won't come, as also the API gets overloaded and eventually becomes unusable for a while.

I see that there's retry and timeout configured for OpenAI calls, but they seem to be hardcoded at the time, so I couldn't adapt that to my needs.

Describe the solution you'd like
OpenAI timeouts and retries could be configurable through the config file and environment variables, so that the currently hardcoded values can be overriden. Whether that would make sense to have different values for different kinds in requests (summarization, intents, embeddings) I don't know. Maybe it's simpler if it's just one thing.

Describe alternatives you've considered
I've turned out intent and entity extraction as an attempt not to overload my API with too many requests in a short period, but unfortunately even a single summarization request can easily take up to a few minutes in my case. For my use-case it's fine if summarization updates take a bit longer, as long as they eventually complete.

Additional context
I understand that doesn't make total sense when consuming the predictable OpenAI API, but since we can use compatible APIs, they could come with different performance implications. I'm falling back to the OpenAI API in Zep now because I can't use my self-hosted API for it, although I'm successfully using that in my application code (but then, again, my use-case is really lenient to slow responses).

@danielchalef
Copy link
Member

We're refactoring our LLM support with a new release expected late Q1/early Q2. We'll consider making timeouts configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants