Multiple language detection within an image #4238

metouitude · 2024-05-06T00:04:22Z

Your Feature Request

Hello,

I'm currently working on a personal project that involves multiple languages detection, and the furthest i got is :

osd = pytesseract.image_to_osd(self.img)
script = re.search("Script: ([a-zA-Z]+)\n", osd).group(1)
conf = re.search("Script confidence: (\d+\.?(\d+)?)", osd).group(1)

Which is directly taken to be honest from https://stackoverflow.com/questions/70198974/how-to-detect-language-or-script-from-an-input-image-using-python-or-tesseract-o

so for example let's say we have an image with 2 or + languages like this one for example :

In this case OSD will only detect Latin with a confidence of 2.22

but at the same time pytesseract.image_to_boxes(self.img,lang="ara") is returning an arabic text,

My point is :

Will it be possible to run 3 time the osd detection in ara/latin/hebrewto return multiple languages ? to make pytesseract.image_to_osd(self.img) detects multiple languages ?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple language detection within an image #4238

Multiple language detection within an image #4238

metouitude commented May 6, 2024

Multiple language detection within an image #4238

Multiple language detection within an image #4238

Comments

metouitude commented May 6, 2024

Your Feature Request