PyPDFtoText

This is Python script that converts any PDF to text using tesseract-OCR. I made this to process pdfs in which text is not selectable.
Please donot use on normal pdfs of which you can just copy out text as this is a heavy to process and slowtask
it works best on simple pdfs which have data in simple book format(also depends on your tesseract installation), more updates coming soon maybe
This uses Tesseract-OCR binaries, pytesseract, PyMuPDF and PIL packages
If you cannot install fitz. try "pip install PyMuPDF"

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
PyPDFtoText.py		PyPDFtoText.py
README.md		README.md
requirements.txt		requirements.txt
run.bat		run.bat

Provide feedback