New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak #1246
Comments
It's possible to reuse the same client for many requests, I think a single one during the lifetime of the process can work fine, at least that's what I've been doing in production so far. (using Python 3.11) So you could move your client creation to the init maybe? I don't know Tornado though, am using Starlette. Sure, creating and closing clients should not leak either, and they have been fixing bugs there earlier. But just a note. |
@antont May I kindly ask, in your service, is every call made using the same set of initialization parameters for the client? In my service, different request sources have their own specified api_version, api_key, and azure_endpoint parameters. Therefore, I initialize a new, corresponding client object for each request. I've also noticed the client.with_options() method that can dynamically change these parameters. However, what I'm uncertain about is, if only a single client is used and concurrently called, would client.with_options() lead to an erroneous override of the client parameters? |
Spot on, that is the case for us.
Yep that's why that method exists. AFAIK it works correctly, so that you can reuse the same client but have different e.g. API keys for different requests. I have not used it, however, but only seen it mentioned in previous similar issues here. Some people are vary of it, one person here at least explained how he creates new client objects just to be sure. I would probably try to read it to review if it seems clear and trustworthy and use it then. I guess there are tests for it too, though strange bug cases can be hard to cover if some such would happen with it. |
Can you share a repository that demonstrates a minimal reproduction? (The code you shared is a helpful starting point, but something we can download and run and see the error would be very helpful). I'd also +1 @antont's suggestion to reuse the client. |
@a383615194 have you fixed this issue ? |
I reuse the client , the issue has gone. the version is 1.23.6 that I used |
Confirm this is an issue with the Python library and not an underlying OpenAI API
Describe the bug
I am using the AsyncAzureOpenAI class to instantiate a client and using a stream call to client.chat.completions.create. Even after performing close() on both client and response within a try-finally block, I am still encountering a memory leak that eventually leads to server crash.
I tried the solution outlined in #1181, where the pydantic package was upgraded to 2.6.3, but this hasn't resolved my issue.
I noticed using the gc library that memory usage increases after each call to this service. Our service is used for centralized management of AzureOpenAI accounts, hence a client is instantiated for every incoming request. Given the concurrent nature of this service, I'm wondering if client.with_options can support concurrent usage. Do you have any good solutions to address this memory leak issue?
To Reproduce
Several calls in a row, for example, to embeddings that are wrapped with asynс.
Code snippets
OS
CentOS
Python version
Python 3.8
Library version
openai v1.12.0
The text was updated successfully, but these errors were encountered: