UDOP different models #397

arvisioncode · 2024-03-07T08:26:53Z

As I understand it, the UDOP model can be used for different tasks such as docvqa, classification or information extraction.
Looking at the notebooks you have on this algorithm, in the inference one I see that the hf model is defined: microsoft/udop-large, and is used for question-answering tasks.

My question would be, are there pretrained UDOP models for different tasks? I haven't found them on hugging face

I have seen that in the nb a prompt is for classifying the image... but I understand that there should be another specific model for this task? Is there that model or another one?

Thank you so much

NielsRogge · 2024-03-09T16:41:37Z

Hi,

Microsoft released 3 pre-trained UDOP models: https://huggingface.co/collections/microsoft/udop-65e625124aee97415b88b513. They were all pre-trained in a general way, to be fine-tuned for tasks like docvqa, classification or information extraction. The best performing model is microsoft/udop-large-512-300k since it uses the highest image resolution (512x512) and is pre-trained the longest.

arvisioncode · 2024-03-11T12:40:02Z

Perfect! Thank you very much for your response!
I have seen that you have added new notebooks for training in different tasks, starting from those base models.
Would it be possible for you to create a new one to fine-tune in docvqa?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDOP different models #397

UDOP different models #397

arvisioncode commented Mar 7, 2024

NielsRogge commented Mar 9, 2024

arvisioncode commented Mar 11, 2024

UDOP different models #397

UDOP different models #397

Comments

arvisioncode commented Mar 7, 2024

NielsRogge commented Mar 9, 2024

arvisioncode commented Mar 11, 2024