Questions on OCR evaluation #5

Soongja · 2023-01-03T04:38:56Z

Hi, I have a few questions on OCR evaluation.

When evaluating OCR performance on DIR300 dataset(or DocUNet benchmark), the size of the predicted image and GT image are different. I suppose you have resized one of the two in advance. To which size did you resize the images?(predicted size or GT size?)
Which tessdata(traineddata) did you use for Tesseract?(tessdata_fast or tessdata_best or tessdata)
reference: https://tesseract-ocr.github.io/tessdoc/Data-Files.html

fh2019ustc · 2023-01-05T08:40:43Z

Hi, sorry for the late reply due to my health.
(1) I have uploaded the evalUnwarp.m in this repo.
(2) For the OCR evaluation, I do not resize the two images. Maybe you could explore the impact of resize operation.
(3) I didn't pay particular attention to this problem. I download the tesseract from the link and the version is 5.0.1.20220118.
Hope this helps~!

Soongja · 2023-01-10T09:27:56Z

Thank you for your reply! I was able to evaluate correctly.

Soongja changed the title ~~Questions on evaluation~~ Questions on OCR evaluation Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on OCR evaluation #5

Questions on OCR evaluation #5

Soongja commented Jan 3, 2023 •

edited

fh2019ustc commented Jan 5, 2023 •

edited

Soongja commented Jan 10, 2023

Questions on OCR evaluation #5

Questions on OCR evaluation #5

Comments

Soongja commented Jan 3, 2023 • edited

fh2019ustc commented Jan 5, 2023 • edited

Soongja commented Jan 10, 2023

Soongja commented Jan 3, 2023 •

edited

fh2019ustc commented Jan 5, 2023 •

edited