Skip to content

An implementation of Tiling and Corruption (TACo) Augmentations for OCR/HTR

License

Notifications You must be signed in to change notification settings

kartikgill/taco-box

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tiling and Corruption (TACo)

License PyPI PRs

TACo is a simple and effective data augmentation technique for the task of Optical Character Recognition (OCR) or Handwritten Text Recognition (HTR) (check reference).

And, taco-box is an implementation of TACo algorithm. This is currently under the Apache 2.0, Please feel free to use for your project. Enjoy!

Installing

First, you need to have python 3 installed in your system.

Next, you can Install taco-box with pip or your favorite PyPi package manager.

pip install taco-box

Usage

Checkout this jupyter notebook on usage - Notebook

Here is an example:

from tacobox import Taco

# creating Taco object. (Note: parameters are at their default value.)
mytaco = Taco(cp_vertical=0.25,
                cp_horizontal=0.25,
                max_tw_vertical=100,
                min_tw_vertical=20,
                max_tw_horizontal=50,
                min_tw_horizontal=10
                )

# apply random vertical corruption
augmented_img = mytaco.apply_vertical_taco(input_img, corruption_type='random')
mytaco.visualize(augmented_img)
    -------Understanding Arguments--------
    :cp_vertical:        corruption probability of vertical tiles
    :cp_horizontal:      corruption probability for horizontal tiles
    :max_tw_vertical:    maximum possible tile width for vertical tiles in pixels
    :min_tw_vertical:    minimum tile width for vertical tiles in pixels
    :max_tw_horizontal:  maximum possible tile width for horizontal tiles in pixels
    :min_tw_horizontal:  minimum tile width for horizontal tiles in pixels

Expected results

Below picture shows the variations of TACo augmentation algorithm from current implementation:-

Example Output

Contributing

This project is in very early stages of development. If there is an issue or feature request, feel free to open an issue. Additionally, a PR is always welcome.

Reference

TACo algorithm is part of a research project on Handwritten Text Recognition. Link to the original paper will be posted soon!!