-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional guidance needed #329
Comments
Thanks for your comments about this repo. Regarding your questions: The training of the private layout model follows exactly the training scripts you were referring to with the only difference that when merging datasets I had one additional datasets containing around 6k images labelled by hand. Looking at the training script, the number of samples taken from Doclaynet is around 75k and Publaynet 25k. That means, you can train with this script a model on a dataset that differs by 6%. It takes 5-6 days on a RTX 3090. Do these 6k images really matters? To be honest, I don't know. I did not train a model on this reduced dataset. Adding some good data can change the game a lot. So, I would like to qualify my last statement about "getting a similar model". It refers to the fact that you can train a model on a slightly smaller publicly available dataset. The model cell/row models haven't been released either. I trained these model on Pubtabnet + ~1k hand labelled data. With the release of Table transformer v2, I don't think they are any better. |
Thank you for your invaluable insights. I've been experimenting with Table Transformer v2 based on your guidelines in the documentation, but I haven't noticed a significant improvement in my specific cases compared to other baseline methods. It's possible that my limited understanding of Table Transformer is hindering my progress, as its parameterization appears to rely heavily on trial and error. Admittedly, my tables are somewhat complex, adorned with embellishments. Any guidance you could provide, @JaMe76, would be greatly appreciated, though I hope not to cause any inconvenience. |
Tatr v2 does not require padding as far as I remember. So, you reducing the default padding values to 0 in the configs should improve the segmentation results. |
Hi, |
Hi @JaMe76,
Sorry to interrupt you. I really appreciate your work. I have been investigating different tools for data extraction and I find your work to be the best so far. I haven't had the opportunity to try out the Gradio example that many people are excited about. That said, I have a few questions.
I'm new to this information extraction field, so there are many things that are a bit out of my scope. I've seen that several people have been asking about your model "xrf_layout/model_final_inf_only.pt". I understand that this model is private, but I've seen that you've made available the possibility of training a very similar model in https://deepdoctection.readthedocs.io/en/latest/tutorials/training_and_logging/.
I got a bit confused, though, with one of your comments on Hugging Face, where you mentioned that the cell/row models used are available, but I don't see where they are actually available.
Finally, I just want to confirm that the example training procedure for the layout model is the correct one, and that it's not just a toy example.
Thank you in advance for your help.
The text was updated successfully, but these errors were encountered: