Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on OCR evaluation #5

Open
Soongja opened this issue Jan 3, 2023 · 2 comments
Open

Questions on OCR evaluation #5

Soongja opened this issue Jan 3, 2023 · 2 comments

Comments

@Soongja
Copy link

Soongja commented Jan 3, 2023

Hi, I have a few questions on OCR evaluation.

  1. When evaluating OCR performance on DIR300 dataset(or DocUNet benchmark), the size of the predicted image and GT image are different. I suppose you have resized one of the two in advance. To which size did you resize the images?(predicted size or GT size?)

  2. Which tessdata(traineddata) did you use for Tesseract?(tessdata_fast or tessdata_best or tessdata)
    reference: https://tesseract-ocr.github.io/tessdoc/Data-Files.html

@Soongja Soongja changed the title Questions on evaluation Questions on OCR evaluation Jan 5, 2023
@fh2019ustc
Copy link
Owner

fh2019ustc commented Jan 5, 2023

Hi, sorry for the late reply due to my health.
(1) I have uploaded the evalUnwarp.m in this repo.
(2) For the OCR evaluation, I do not resize the two images. Maybe you could explore the impact of resize operation.
(3) I didn't pay particular attention to this problem. I download the tesseract from the link and the version is 5.0.1.20220118.
Hope this helps~!

@Soongja
Copy link
Author

Soongja commented Jan 10, 2023

Thank you for your reply! I was able to evaluate correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants