Add TableTransformerImageProcessor #30718

NielsRogge · 2024-05-08T17:49:50Z

Feature request

The Table Transformer is a model with basically the same architecture as DETR.

Now, when people do this:

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("microsoft/table-transformer-detection")
print(type(processor))

this will print DetrImageProcessor.

However, Table Transformer has some specific image processing settings which aren't exactly the same as in DETR:

from torchvision import transforms

class MaxResize(object):
    def __init__(self, max_size=800):
        self.max_size = max_size

    def __call__(self, image):
        width, height = image.size
        current_max_size = max(width, height)
        scale = self.max_size / current_max_size
        resized_image = image.resize((int(round(scale*width)), int(round(scale*height))))
        
        return resized_image

# this is required for the table detection models
detection_transform = transforms.Compose([
    MaxResize(800),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# this is required for the table structure recognition models
structure_transform = transforms.Compose([
    MaxResize(1000),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Hence we could create a separate TableTransformerImageProcessor which replicates this.

Motivation

Would be great to 100% replicate original preprocessing settings

Your contribution

I could work on this but would be great if someone else can take this up

The text was updated successfully, but these errors were encountered:

nileshkokane01 · 2024-05-09T03:09:16Z

@NielsRogge ,
Will do that .

NielsRogge · 2024-05-09T08:40:04Z

Great, see https://github.com/microsoft/table-transformer/blob/16d124f616109746b7785f03085100f1f6247575/src/inference.py#L39-L49 as there's a difference between the detection model and the structure recognition models

nileshkokane01 · 2024-05-09T14:03:15Z

@NielsRogge just to reconfirm. we need to have a image_processing_table_transformer defining TableTransformerImageProcessor that has specific TableTransformer transform for structure/detect.

Any other specifics apart from that ? any other diff ? I will anyways try finding.

NielsRogge added this to To do in Computer vision May 8, 2024

amyeroberts added Vision Feature request Request for a new feature labels May 9, 2024

nileshkokane01 linked a pull request May 10, 2024 that will close this issue

Added TableTrasnformerImageProcessor #30747

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TableTransformerImageProcessor #30718

Add TableTransformerImageProcessor #30718

NielsRogge commented May 8, 2024 •

edited

nileshkokane01 commented May 9, 2024

NielsRogge commented May 9, 2024

nileshkokane01 commented May 9, 2024 •

edited

Add TableTransformerImageProcessor #30718

Add TableTransformerImageProcessor #30718

Comments

NielsRogge commented May 8, 2024 • edited

Feature request

Motivation

Your contribution

nileshkokane01 commented May 9, 2024

NielsRogge commented May 9, 2024

nileshkokane01 commented May 9, 2024 • edited

NielsRogge commented May 8, 2024 •

edited

nileshkokane01 commented May 9, 2024 •

edited