GPU utilisation using the docker image provided + docker suggestion #240
Replies: 1 comment
-
Thank you for your words and your comments. Regarding GPU utilisation: All Tensorflow/PyTorch models, should run on a GPU if it is available. Before releasing the docker image, I checked using some documents, but only based on time measurements. I did not check if the GPU was actually utilized. However, there is also this issue: The sequential processing of the analyzer leads to the GPU being almost not used at all. This issue can be mitigated, for example, by chopping your document and going through the pipeline components in batches. For this, there is But even then there are massive opportunities for optimization, especially if you are multi-GPU capable. On the one hand, one could think about letting certain components run asynchronously: Layout/table structuring is independent of OCR, so it does not have to run sequentially. In addition, the pipeline components and predictors can be made more independent by viewing the predictor itself as a server. The component could then send requests, which are processed by the predictor in batches (almost all predictors allow multi batch inference) and the results are consolidated. But this will require more some more changes. |
Beta Was this translation helpful? Give feedback.
-
Hi! Firstly, brilliant repo! I really like how precise this library is at picking out all of the details from documents.
I'd like to make some performance improvements to speed up deepdoctection processing. Ideally, this means processing large PDFs (100+ pages) in minutes rather than 10s of minutes. I have many GPUs at disposal.
I have hence setup the docker image that you provide in this repo (torch CUDA). I can see that the models get loaded onto GPU VRAM but I do not see the GPU utilisation of these models. Am I missing something somewhere to make this happen?
Additionally, any pointers on performance improvements would be much appreciated. I'm mainly interested in extracting tables and text for now.
Finally, just one suggestion regarding the docker images provided, it would be great to have a variant of this that turns the container into a server. It could use gRPC to receive requests and return the response. This would be a great tool for users who want to integrate deepdoctection into a modular application!
Many thanks, and keep it up!
John.
Beta Was this translation helpful? Give feedback.
All reactions