Memory leak #197

jmackie · 2023-08-23T09:34:35Z

We're exploring using the unstructured API at work.

We're running quay.io/unstructured-io/unstructured-api:c9b74d4 on a "Pro" (private service) Render instance (i.e. 4GB RAM)

We're using the service to process PDFs with the following parameters strategy=hi_res, pdf_infer_table_structure=true and skip_infer_table_types=[]. We're also using parallel mode via UNSTRUCTURED_PARALLEL_MODE_ENABLED=true (using the defaults for the other environment vars).

We've seen the service fall over several times due to OOM, and looking at metrics it looks as if there are resources not being freed after processing runs.

Each spike represents a processing run, with about 10 minutes between each.

The text was updated successfully, but these errors were encountered:

awalker4 · 2023-09-07T16:15:08Z

Hi there, apologies for the late response. We certainly still have work to do on improving memory. We haven't seen a leak yet, but it's possible that batching work like this makes it come on faster. Happy to collaborate on this. First off:

Can you try your workload with the latest unstructured-api? We've had a number of fixes in the last month that should be relevant, namely caching the layout models and reducing the number of images generated for tesseract.
We've added UNSTRUCTURED_MEMORY_FREE_MINIMUM_MB to control some of the OOM kills for now. Try setting this so that we'll reject new documents when memory is low.

Can you share more details about your workload that we can try to replicate?

awalker4 · 2023-10-26T17:12:46Z

I'm going to close this as we've made a lot of memory improvements over the last few months. Please feel free to create a new issue if needed!

awalker4 · 2024-03-25T16:48:05Z

We still have memory issues floating around, so going to reopen this. cc @lambda-science

ill-yes · 2024-04-15T15:12:39Z

Same here for me. Using v0.0.65

awalker4 closed this as completed Oct 26, 2023

awalker4 reopened this Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak #197

Memory leak #197

jmackie commented Aug 23, 2023

awalker4 commented Sep 7, 2023 •

edited

awalker4 commented Oct 26, 2023

awalker4 commented Mar 25, 2024

ill-yes commented Apr 15, 2024 •

edited

Memory leak #197

Memory leak #197

Comments

jmackie commented Aug 23, 2023

awalker4 commented Sep 7, 2023 • edited

awalker4 commented Oct 26, 2023

awalker4 commented Mar 25, 2024

ill-yes commented Apr 15, 2024 • edited

awalker4 commented Sep 7, 2023 •

edited

ill-yes commented Apr 15, 2024 •

edited