New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug/executing partition_doc using concurrent futures #2891
Comments
@salahaz please provide the entire stack trace. |
Also, see if that particular file works when you are not using threads and just use |
@scanny the particular files work without threading, and partitioning the files sequentially using a for loop works too; However, when using concurrent futures this error is raised from
|
Hmm, okay, so a little background to understand what might be happening here:
My working hypothesis is that the temporary directory used to hold the "interim" One thing that might be worth trying would be to reduce the number of workers to something like 8 and see what happens. You could use: max_workers = min(8, os.cpu_count()) to avoid coupling Otherwise I think we'll have to put this on the list to be investigated and see if we can reproduce it on our side. If you're game for patching some of the library code in your But doing that is unlikely to produce a working solution, it would just help us narrow down the problem. |
@scanny I tried your initial suggestion using |
Hi @salahaz, we'll track this issue and see what we can discover. In the meantime, I don't believe a multi-threading approach is viable for multiple A few things you can try:
Let us know how you go :) |
Closing as inactive. @salahaz feel free to reopen if you're still having trouble. And if you discovered a solution let us know so others can learn from your experience :) |
When attempting to execute
partition_doc
to pre-process multiple documents at the same time it fails by throwing the following error:PackageNotFoundError: Package not found at '/var/folders/p5/dljg1qv95y97dyq1c38xgb6r0000gn/T/tmp3nwg1qob/test.docx'
Here is a sample code that causes the same issue:
Environment Details
python = 3.11.1
unstructured = 0.13.2
MacOS 14.3
Any help regarding this ? Or how to process documents in a parallel way using the library ?
The text was updated successfully, but these errors were encountered: