Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invisible Unicode character at end of all text #48

Open
matt-laird opened this issue Aug 28, 2023 · 3 comments
Open

Invisible Unicode character at end of all text #48

matt-laird opened this issue Aug 28, 2023 · 3 comments

Comments

@matt-laird
Copy link

There seems to be a U+000c invisible Unicode character at the end of all generated text. This causes problems in some applications when pasting resulting text. See below example, problem on line 2:
image

@RajSolai
Copy link
Owner

for a long time I have also faced this issue, is just a string trim fine ? so is there something with tesseract that I should configure any ideas ?

@matt-laird
Copy link
Author

I had a brief look, it does seem to be an artifact from Tesseract's process, maybe give this a read and see if the different options help at all - Tesseract FAQ, unfortunately I can't test these myself right now.

@RajSolai
Copy link
Owner

I think we can trim the string for now I guess, thanks now I also got the Exact unicode to find and remove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants