Skip to content

A simple Python + Tkinter + Tesseract-based GUI image-to-text copypaste pad application

Notifications You must be signed in to change notification settings

FlyingFathead/OCR-CopyPastePad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 

Repository files navigation

OCR-CopyPastePad

A simple GUI tool for OCR image-to-text copy-paste-pad that uses Python + tkinter + pytesseract + python-opencv + easyocr (user selectable).

OCR-CopyPastePad screenshot

About

With OCR-CopyPastePad, you can easily get your text-containing image files read into plaintext format. The program uses various user-selectable methods to try to interpret text date from an imported image or an input copy-paste, such as pytesseract's Tesseract OCR or easyocr. The program also supports i.e. inverting the colors on the input image for higher degree of OCR accuracy.

The aim of the program is to simplify workflows, i.e. making it easy to copy-paste the text data to a text editor, ChatGPT or some other AI LLM that you need to go text data through with. The idea is for the program to be as simple as possible when OCR conversion from image to text is needed in a given workflow.

Features

  • Uses pytesseract for OCR and python-opencv (cv2) to detect ROI's (= regions of interest) for higher accuracy.
  • Easy Image Import: Load images directly from your computer or simply paste them using CTRL+V or Shift+Insert. Designed to be used i.e. in conjunction with the snippet tool in Windows (10, 11): WinKey + Shift + S
  • Image Preprocessing: Before text extraction, images undergo preprocessing to enhance the accuracy of the OCR. This includes grayscale conversion, binary thresholding, and resizing.
  • Intuitive Interface: The split-pane design allows users to view the original image side-by-side with the extracted text.
  • Error Handling: Informative error messages guide users when issues arise, such as when non-image data is pasted.

Install

This tool requires Python v3.4 or newer to run. You will also need git to clone the repository.

1. Clone the repository

git clone https://github.com/FlyingFathead/OCR-CopyPastePad/
cd OCR-CopyPastePad/

2. Install the prerequisites

pip install -r requirements.txt

(or, manually: pip install -U pytesseract Pillow python-opencv easyocr)

You will also need to download the tesseract libraries and install them.

  • On Windows, download i.e. the pre-compiled installer: https://github.com/UB-Mannheim/tesseract/wiki Note that on windows you need to add the installation directory to your PATH environment variable. If you installed the UB-Mannheim Tesseract version for all users, you can do this in an administrator PowerShell with i.e.:

    [System.Environment]::SetEnvironmentVariable('Path', [System.Environment]::GetEnvironmentVariable('Path', [System.EnvironmentVariableTarget]::Machine) + ";C:\Program Files\Tesseract-OCR", [System.EnvironmentVariableTarget]::Machine)`
    

    The command above assumes that your install directory was C:\Program Files\Tesseract-OCR\ -- change the command above accordingly to point to the correct directory!

  • On Linux, i.e. Ubuntu: sudo apt install tesseract-ocr

  • On MacOS, using Homebrew: brew install tesseract

3. Run the program

python OCR-CopyPastePad.py

Usage

  1. Launch the OCR-CopyPastePad application (python OCR-CopyPastePad.py). You can also try out if your OCR results are better with the non-OpenCV version by running python OCR-CopyPastePad_no_OpenCV_ROI.py.
  2. Load an image using the "Load Image" button or paste an image directly into the application (in Windows you can use i.e. the snippet tool: Shift + Winkey + S).
  3. If desired, use the "Detect Text Areas" button to see highlighted regions of text in the image.
  4. The extracted text will automatically appear in the text pane on the right.
  • Note that results may vary between source texts etc. -- in some cases, running the non-OpenCV version might actually yield more accurate results. OCR is a... thing.

Changelog

  • v0.146: check for tesseract libraries on startup
  • v0.145: ROI sorting logic redone for EasyOCR processing
  • v0.144: Fixes to status update threading
  • v0.143: small changes to the overall OCR pipeline; preprocess to check if i.e. color inversion is needed
  • v0.14: crop function, better EasyOCR line detection
  • v0.12: Better clipboard handling, OCR processing status text display
  • v0.11: Added support for EasyOCR for a more precise OCR interpretation, program runs pytesseract by default, more "in-depth" OCR:ing can be done with easyocr (EasyOCR's model is downloaded automatically upon first run if not installed).
  • v0.09: Added image dilation+internal resize (times 3 by default) for better OCR accuracy, Tesseract language selection, other stuff WIP.
  • v0.08: Added the GUI option to invert image colors for better OCR accuracy.

Todo

  • Better implementation of the clipboard copy-paste-functionality
  • User-drawable rectangle regions of interest on image

About