Learning OCRmyPDF

Notes during the learning of OCRmyPDF, a Tesseract based Optical Character Recognition(OCR) software

License

Unless otherwise specified, content in this work is release under the Creative Commons BY-SA 4.0+ license. Most source materials are distributed under the Fair Use principle.

References

OCRmyPDF documentation

Multi-page recognition, with language presplit

$ ocrmypdf \
    --title 'UNFAIR TRADE PRACTICES AND SAFEGUARD ACTIONS' \
    --author '蔡英文(Ing-wen, Tsai)' \
    --language eng \
    --deskew \
    --clean \
    --skip-text \
    --pages 1,3-416 \
    --verbose 1 \
    'UNFAIR TRADE PRACTICES AND SAFEGUARD ACTIONS.decrypted.pdf' \
    'UNFAIR TRADE PRACTICES AND SAFEGUARD ACTIONS.decrypted.ocr.HanT_eng.tessdata_fast.pdf'

$ ocrmypdf \
    --title 'UNFAIR TRADE PRACTICES AND SAFEGUARD ACTIONS' \
    --author '蔡英文(Ing-wen, Tsai)' \
    --language eng \
    --deskew \
    --clean \
    --skip-text \
    --pages 2 \
    --verbose 1 \
    'UNFAIR TRADE PRACTICES AND SAFEGUARD ACTIONS.decrypted.pdf' \
    'UNFAIR TRADE PRACTICES AND SAFEGUARD ACTIONS.decrypted.ocr.HanT_eng.tessdata_fast.pdf'

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Learning OCRmyPDF

License

References

Multi-page recognition, with language presplit

About

Releases

Packages

brlin-tw/learning-ocrmypdf

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Learning OCRmyPDF

License

References

Multi-page recognition, with language presplit

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages