Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #197

Open
jmackie opened this issue Aug 23, 2023 · 4 comments
Open

Memory leak #197

jmackie opened this issue Aug 23, 2023 · 4 comments

Comments

@jmackie
Copy link

jmackie commented Aug 23, 2023

We're exploring using the unstructured API at work.

We're running quay.io/unstructured-io/unstructured-api:c9b74d4 on a "Pro" (private service) Render instance (i.e. 4GB RAM)

We're using the service to process PDFs with the following parameters strategy=hi_res, pdf_infer_table_structure=true and skip_infer_table_types=[]. We're also using parallel mode via UNSTRUCTURED_PARALLEL_MODE_ENABLED=true (using the defaults for the other environment vars).

We've seen the service fall over several times due to OOM, and looking at metrics it looks as if there are resources not being freed after processing runs.

image

Each spike represents a processing run, with about 10 minutes between each.

@awalker4
Copy link
Collaborator

awalker4 commented Sep 7, 2023

Hi there, apologies for the late response. We certainly still have work to do on improving memory. We haven't seen a leak yet, but it's possible that batching work like this makes it come on faster. Happy to collaborate on this. First off:

  • Can you try your workload with the latest unstructured-api? We've had a number of fixes in the last month that should be relevant, namely caching the layout models and reducing the number of images generated for tesseract.
  • We've added UNSTRUCTURED_MEMORY_FREE_MINIMUM_MB to control some of the OOM kills for now. Try setting this so that we'll reject new documents when memory is low.

Can you share more details about your workload that we can try to replicate?

@awalker4
Copy link
Collaborator

I'm going to close this as we've made a lot of memory improvements over the last few months. Please feel free to create a new issue if needed!

@awalker4
Copy link
Collaborator

We still have memory issues floating around, so going to reopen this. cc @lambda-science

@awalker4 awalker4 reopened this Mar 25, 2024
@ill-yes
Copy link

ill-yes commented Apr 15, 2024

Same here for me. Using v0.0.65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants