Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDOP different models #397

Open
arvisioncode opened this issue Mar 7, 2024 · 2 comments
Open

UDOP different models #397

arvisioncode opened this issue Mar 7, 2024 · 2 comments

Comments

@arvisioncode
Copy link

Good morning @NielsRogge !

As I understand it, the UDOP model can be used for different tasks such as docvqa, classification or information extraction.
Looking at the notebooks you have on this algorithm, in the inference one I see that the hf model is defined: microsoft/udop-large, and is used for question-answering tasks.

My question would be, are there pretrained UDOP models for different tasks? I haven't found them on hugging face

I have seen that in the nb a prompt is for classifying the image... but I understand that there should be another specific model for this task? Is there that model or another one?

Thank you so much

@NielsRogge
Copy link
Owner

Hi,

Microsoft released 3 pre-trained UDOP models: https://huggingface.co/collections/microsoft/udop-65e625124aee97415b88b513. They were all pre-trained in a general way, to be fine-tuned for tasks like docvqa, classification or information extraction. The best performing model is microsoft/udop-large-512-300k since it uses the highest image resolution (512x512) and is pre-trained the longest.

@arvisioncode
Copy link
Author

Perfect! Thank you very much for your response!
I have seen that you have added new notebooks for training in different tasks, starting from those base models.
Would it be possible for you to create a new one to fine-tune in docvqa?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants