Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TableTransformerImageProcessor #30718

Open
NielsRogge opened this issue May 8, 2024 · 3 comments · May be fixed by #30747
Open

Add TableTransformerImageProcessor #30718

NielsRogge opened this issue May 8, 2024 · 3 comments · May be fixed by #30747
Labels
Feature request Request for a new feature Vision

Comments

@NielsRogge
Copy link
Contributor

NielsRogge commented May 8, 2024

Feature request

The Table Transformer is a model with basically the same architecture as DETR.

Now, when people do this:

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("microsoft/table-transformer-detection")
print(type(processor))

this will print DetrImageProcessor.

However, Table Transformer has some specific image processing settings which aren't exactly the same as in DETR:

from torchvision import transforms

class MaxResize(object):
    def __init__(self, max_size=800):
        self.max_size = max_size

    def __call__(self, image):
        width, height = image.size
        current_max_size = max(width, height)
        scale = self.max_size / current_max_size
        resized_image = image.resize((int(round(scale*width)), int(round(scale*height))))
        
        return resized_image

# this is required for the table detection models
detection_transform = transforms.Compose([
    MaxResize(800),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# this is required for the table structure recognition models
structure_transform = transforms.Compose([
    MaxResize(1000),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Hence we could create a separate TableTransformerImageProcessor which replicates this.

Motivation

Would be great to 100% replicate original preprocessing settings

Your contribution

I could work on this but would be great if someone else can take this up

@NielsRogge NielsRogge added this to To do in Computer vision May 8, 2024
@nileshkokane01
Copy link
Contributor

@NielsRogge ,
Will do that .

@amyeroberts amyeroberts added Vision Feature request Request for a new feature labels May 9, 2024
@NielsRogge
Copy link
Contributor Author

Great, see https://github.com/microsoft/table-transformer/blob/16d124f616109746b7785f03085100f1f6247575/src/inference.py#L39-L49 as there's a difference between the detection model and the structure recognition models

@nileshkokane01
Copy link
Contributor

nileshkokane01 commented May 9, 2024

@NielsRogge just to reconfirm. we need to have a image_processing_table_transformer defining TableTransformerImageProcessor that has specific TableTransformer transform for structure/detect.

Any other specifics apart from that ? any other diff ? I will anyways try finding.

@nileshkokane01 nileshkokane01 linked a pull request May 10, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature Vision
Projects
Development

Successfully merging a pull request may close this issue.

3 participants