New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fixed resize and pad strategy for object detection #30742
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Very nicely handled: backwards compatible; easy to control and well tested. It's going to be great to control the padding behaviour this way - should be faster too ❤️
Only thought is about consistency with other image processor. Some of them e.g. SAM have an explicit pad_size
argument they use to control this behaviour. I think it's fine, as their resize
method will already explicitly raise an exception if the user tries to pass through e.g. "max_size"
.
8f4b44b
to
6682c34
Compare
2eeacfe
to
bd01808
Compare
@amyeroberts I decided to add the
image_processor = DetrImageProcessor(
do_resize=True,
size={"max_height": 300, "max_width": 100},
do_pad=True,
pad_size={"height": 350, "width": 150},
) It requires a bit more code across different methods (init, preprocess, pad) but should be more explicit for users with more predictable behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - thanks for iterating!
Just a question about the addition of max_height
and max_width
. I don't feel super strongly, so happy to go with what you think it best.
{"shortest_edge"}, | ||
{"shortest_edge", "longest_edge"}, | ||
{"longest_edge"}, | ||
{"max_height", "max_width"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the models are now using pad_size
- do we need the max_size
arguments here?
- `{"max_height": int, "max_width": int}`: The image will be resized to the maximum size respecting the | ||
aspect ratio and keeping the height less or equal to `max_height` and the width less or equal to | ||
`max_width`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As max_size no longer controls the padding, I think we don't need to include this argument at the moment, but could add in a separate follow-up PR, as the default shortest_edge
and longest_edge
already provide bounds. Is it something you found to be useful when finetuning with the object detection script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"max_height": int, "max_width": int}
provides a more controllable way of resizing, it is not necessary, but it allows us to use non-square pad_size
.
For example, we have two images in a batch with shapes
[
[400, 600],
[600, 400],
]
If we specify size={"shortest_edge": 200, "longest_edge": 300}
we have to set pad_size={"height": 300, "width": 300}
(square), because we don't know height or width will be resized to "largest_edge".
[
# original -> resized -> padded
[400, 600] -> [200, 300] -> [300, 300]
[600, 400] -> [300, 200] -> [300, 300]
]
If we specify size={"max_height": 200, "max_width": 300}
we can set pad_size={"height": 200, "width": 300}
(non-square).
[
# original -> resized -> padded
[400, 600] -> [200, 300] -> [200, 300]
[600, 400] -> [200, 133] -> [200, 300]
]
I don't think it is very common case, but might be useful in cases when you expect most of the images to have a particular orientation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's keep it!
What does this PR do?
Add a new strategy for image resizing and padding.
As it was discussed in #30422 (object detection examples), fixed resizing and padding strategies boost models' quality and allow models to converge faster for fine-tuning.
Currently, we make padding based on the maximum height and width in a batch, and image sizes vary from batch to batch leading to model metrics becoming batch-dependent.
To keep it simple and backward compatible, proposed to support a new size dict
{"max_height": ..., "max_width": ...}
. The image will be resized to the maximum size respecting the aspect ratio and keeping the height less or equal tomax_height
and the width less or equal tomax_width
.In terms of padding, implemented the following:
If
max_height
andmax_width
are provided in thesize
parameter, the image will be padded to themax_height
andmax_width
dimensions. Otherwise, the image will be padded to the maximum height and width of the batch.The padding strategy is not very explicit, probably it worth implementing separate keywords for padding. e.g.
{"max_height": ..., "max_width": ..., "padded_height": ..., "padded_width" ...}
. In that case, if "padded" keywords are provided in any size dict, then padding follows it, otherwise, we pad based on the maximum height and width of a batch.(A small downside of it, is that in most cases "max_height" = "max_width" = "padded_height" = "padded_width", and we will need to specify 4 equal parameters.)
Who can review?
@amyeroberts @NielsRogge please let me know your opinion, I implemented it for DETR, if design is OK I will update all object detection models to support it.