Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional guidance needed #329

Open
TheMrguiller opened this issue May 3, 2024 · 4 comments
Open

Additional guidance needed #329

TheMrguiller opened this issue May 3, 2024 · 4 comments

Comments

@TheMrguiller
Copy link

Hi @JaMe76,

Sorry to interrupt you. I really appreciate your work. I have been investigating different tools for data extraction and I find your work to be the best so far. I haven't had the opportunity to try out the Gradio example that many people are excited about. That said, I have a few questions.

I'm new to this information extraction field, so there are many things that are a bit out of my scope. I've seen that several people have been asking about your model "xrf_layout/model_final_inf_only.pt". I understand that this model is private, but I've seen that you've made available the possibility of training a very similar model in https://deepdoctection.readthedocs.io/en/latest/tutorials/training_and_logging/.

I got a bit confused, though, with one of your comments on Hugging Face, where you mentioned that the cell/row models used are available, but I don't see where they are actually available.

Finally, I just want to confirm that the example training procedure for the layout model is the correct one, and that it's not just a toy example.

Thank you in advance for your help.

@JaMe76
Copy link
Contributor

JaMe76 commented May 3, 2024

Thanks for your comments about this repo.

Regarding your questions: The training of the private layout model follows exactly the training scripts you were referring to with the only difference that when merging datasets I had one additional datasets containing around 6k images labelled by hand. Looking at the training script, the number of samples taken from Doclaynet is around 75k and Publaynet 25k. That means, you can train with this script a model on a dataset that differs by 6%. It takes 5-6 days on a RTX 3090.

Do these 6k images really matters? To be honest, I don't know. I did not train a model on this reduced dataset. Adding some good data can change the game a lot. So, I would like to qualify my last statement about "getting a similar model". It refers to the fact that you can train a model on a slightly smaller publicly available dataset.

The model cell/row models haven't been released either. I trained these model on Pubtabnet + ~1k hand labelled data. With the release of Table transformer v2, I don't think they are any better.

@TheMrguiller
Copy link
Author

Thank you for your invaluable insights. I've been experimenting with Table Transformer v2 based on your guidelines in the documentation, but I haven't noticed a significant improvement in my specific cases compared to other baseline methods. It's possible that my limited understanding of Table Transformer is hindering my progress, as its parameterization appears to rely heavily on trial and error. Admittedly, my tables are somewhat complex, adorned with embellishments. Any guidance you could provide, @JaMe76, would be greatly appreciated, though I hope not to cause any inconvenience.

@JaMe76
Copy link
Contributor

JaMe76 commented May 8, 2024

Tatr v2 does not require padding as far as I remember. So, you reducing the default padding values to 0 in the configs should improve the segmentation results.

@TheMrguiller
Copy link
Author

Hi,
Thank you for the feedback @JaMe76 . After a bit of research and also making a lot of trials wit Tatr v2, a found this example https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Table%20Transformer/Inference_with_Table_Transformer_(TATR)_for_parsing_tables.ipynb which works really well. But i did found out that the model tends to at least crop the table too short and latter on gives problems when doing table column and row detection. Also by Tatr v2 you mean the base Table transformer from huggingface https://huggingface.co/microsoft/table-transformer-detection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants