GPU utilisation using the docker image provided + docker suggestion #240

Johnno1011 · 2023-10-20T16:28:12Z

Johnno1011
Oct 20, 2023

Hi! Firstly, brilliant repo! I really like how precise this library is at picking out all of the details from documents.

I'd like to make some performance improvements to speed up deepdoctection processing. Ideally, this means processing large PDFs (100+ pages) in minutes rather than 10s of minutes. I have many GPUs at disposal.

I have hence setup the docker image that you provide in this repo (torch CUDA). I can see that the models get loaded onto GPU VRAM but I do not see the GPU utilisation of these models. Am I missing something somewhere to make this happen?

Additionally, any pointers on performance improvements would be much appreciated. I'm mainly interested in extracting tables and text for now.

Finally, just one suggestion regarding the docker images provided, it would be great to have a variant of this that turns the container into a server. It could use gRPC to receive requests and return the response. This would be a great tool for users who want to integrate deepdoctection into a modular application!

Many thanks, and keep it up!

John.

JaMe76 · 2023-10-21T10:40:51Z

JaMe76
Oct 21, 2023
Maintainer

Thank you for your words and your comments.

Regarding GPU utilisation: All Tensorflow/PyTorch models, should run on a GPU if it is available. Before releasing the docker image, I checked using some documents, but only based on time measurements. I did not check if the GPU was actually utilized.

However, there is also this issue: The sequential processing of the analyzer leads to the GPU being almost not used at all. This issue can be mitigated, for example, by chopping your document and going through the pipeline components in batches. For this, there is MultiThreadPipelineComponent, with which batch processing is possible. But a pipeline using MultiThreadPipelineComponent instead of the simple components must be re-built. With this approach, the GPU is much better utilized.

But even then there are massive opportunities for optimization, especially if you are multi-GPU capable. On the one hand, one could think about letting certain components run asynchronously: Layout/table structuring is independent of OCR, so it does not have to run sequentially. In addition, the pipeline components and predictors can be made more independent by viewing the predictor itself as a server. The component could then send requests, which are processed by the predictor in batches (almost all predictors allow multi batch inference) and the results are consolidated. But this will require more some more changes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU utilisation using the docker image provided + docker suggestion #240

{{title}}

Replies: 1 comment

{{title}}

Select a reply

GPU utilisation using the docker image provided + docker suggestion #240

Johnno1011 Oct 20, 2023

Replies: 1 comment

JaMe76 Oct 21, 2023 Maintainer

Johnno1011
Oct 20, 2023

JaMe76
Oct 21, 2023
Maintainer