ORTOptimizer for the model type Segformer #1820

zachmayer · 2024-04-18T22:27:51Z

What does this PR do?

Adds the segformer model to ORTOptimizer. Based on the advice I got in #1761, but I decided to start with segformer.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@mht-sharma maybe?

optimum/utils/normalized_config.py

tests/onnxruntime/test_optimization.py

zachmayer · 2024-04-19T13:13:59Z

optimum/utils/normalized_config.py

+            if attr_value is None:
+                raise AttributeError(f"Attribute {self.NUM_ATTENTION_HEADS} not found in config")
+            if isinstance(attr_value, list):
+                return max(attr_value)


I went with max here, but if needed I can try to figure out how to implement list support

I think it makes more sense to return the list as is. and it's the exporter's job to handle it.
@mht-sharma do you know what hidden_size or num_attention_heads can be used for with image classification models ?

So if I remove the max and run pytest tests/onnxruntime/test_optimization.py -k test_compare_original_image_model_with_optimized_model -vv, I get the following error:

../../venvs/optimum/lib/python3.11/site-packages/onnxruntime/transformers/optimizer.py:178: in optimize_by_fusion optimizer = optimizer_class(model, num_heads, hidden_size) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <onnx_model_bert.BertOnnxModel object at 0x16fbebf50>, model = ir_version: 7 opset_import { version: 12 } opset_import { domain: "com.microsoft.experimental" version: 1 } opse...m_param: "batch_size" } dim { dim_value: 150 } } } } } } , num_heads = [1, 2, 5, 8], hidden_size = 256 def __init__(self, model: ModelProto, num_heads: int = 0, hidden_size: int = 0): """Initialize BERT ONNX Model. Args: model (ModelProto): the ONNX model num_heads (int, optional): number of attention heads. Defaults to 0 (detect the parameter automatically). hidden_size (int, optional): hidden dimension. Defaults to 0 (detect the parameter automatically). """ > assert (num_heads == 0 and hidden_size == 0) or (num_heads > 0 and hidden_size % num_heads == 0) E TypeError: '>' not supported between instances of 'list' and 'int'

It looks like the onnxruntime optimizer expects num_attention_heads to be an int.

maybe microsoft/onnxruntime#17254 is related

So Microsoft hasn't tested the optimizer with segformers yet: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/README.md#supported-models

They expect both num_heads and hidden_size to be ints:

num_heads (int, optional): number of attention heads. Defaults to 0. 0 allows detect the parameter from graph automatically. hidden_size (int, optional): hidden size. Defaults to 0. 0 allows detect the parameter from graph automatically.

If I convert model_quantized.config.hidden_sizes and model_quantized.config.num_attention_heads to integers by taking the max, the optimizer works and does seem to change the graph of the model

@zachmayer, curious to know which part of the model / encoder block is optimised. Since you gave max values I suppose the last one in the hierarchy? Or all encoder blocks are optimised?

If first, then it could be worth to try out having a loop on the optimiser for each pair of hidden size and
attention head and see if all the blocks are optimised?

So this model may not be supported by onnxruntime itself.

By this I was referring to the onnxruntime optimiser support, since the optimiser does not take list values

How do I tell which blocks get optimized?

I just pushed a commit that uses sum instead of max, and uses the ORTModelForSemanticSegmentation in the test, and the test passes.

I'll try some different parameters and see what changes.

num_heads and hidden_size is optional for onnxruntime transformers optimizer. If you are not sure, set them to 0.

You can use Netron to view the onnx model before/after optimization.

@tianleiwu — ok! I just pushed a new commit where I set them to 0.

tests/onnxruntime/test_optimization.py

IlyasMoutawwakil

Thanks for the PR! I left few comments.

zachmayer · 2024-04-25T13:21:34Z

@mht-sharma @IlyasMoutawwakil the optimizer is definitely doing something to the model. I tested it on the vikp/surya_layout segformer, where I exported the original model to onnx, optimized it, and then quantized the optimized model and counted the number of nodes in the graph:

Original model graph: 2900
Optimized model graph: 1263
Quantized model graph: 1709

The optimizer definitely prunes nodes from the graph and the resulting model is faster for inference when I test it.

zachmayer · 2024-04-25T14:06:16Z

I tried 3 ways of handling the lists:

Sum
Max
Replace the list with 0 (and let microsoft's optimizer infer based on the graph)

Sum/Max yield pretty similar results, so I went with sum. 0 did not seem to work well.

The tests pass on the PR now, and when I test this optimizer on a real segformer, it definitely makes changes to the model graph.

zachmayer · 2024-04-29T13:17:01Z

@mht-sharma @IlyasMoutawwakil what do you think? The tests pass when I run them locally, and the optimizer seems to be able to reduce the size of the model a lot. (Almost 60%)

IlyasMoutawwakil · 2024-04-29T14:58:27Z

@zachmayer to visualize the graphs you can use https://netron.app/
I have ran your code on vikp/surya_layout and sum gives the optimizer the wrong hidden_size/num_attention_heads, which are then ignored in favor of the detected values (from last encoder block)

Optimizing model...
symbolic shape inference disabled or failed.
symbolic shape inference disabled or failed.
--num_heads is 16. Detected value is 8. Using detected value.
--hidden_size is 1024. Detected value is 512. Using detected value.

which are the max values (or last in the lists).
also comparing these two (simple export vs O2 using max).

I don't see any optimizations, @mht-sharma any idea which operators we should be looking for ?

zachmayer · 2024-05-01T14:06:51Z

huh. I also tried using 0, which the docs said would infer the number of heads based on the model graph.

I changed from sum to max and the values seem correct now: 8 for num_heads and 512 for hidden_size.

In my testing the optimize model definitely has a smaller graph and faster inference. So the optimizer is doing something to the model.

Zach Deane-Mayer added 2 commits April 18, 2024 16:31

add segformer

1a9c8ee

black

621e14b

zachmayer commented Apr 18, 2024

View reviewed changes

optimum/utils/normalized_config.py Outdated Show resolved Hide resolved

zachmayer commented Apr 18, 2024

View reviewed changes

tests/onnxruntime/test_optimization.py Outdated Show resolved Hide resolved

zachmayer mentioned this pull request Apr 19, 2024

ORTOptimizer for the model type table-transformer #1761

Open

zachmayer commented Apr 19, 2024

View reviewed changes

IlyasMoutawwakil reviewed Apr 23, 2024

View reviewed changes

tests/onnxruntime/test_optimization.py Outdated Show resolved Hide resolved

IlyasMoutawwakil requested changes Apr 23, 2024

View reviewed changes

Zach Deane-Mayer added 3 commits April 23, 2024 15:31

make format

e77e041

decoder_hidden_size not a list

95e02dc

tests pass now

8723aca

use max

c2f8e1f

use zero

cf8e81e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORTOptimizer for the model type Segformer #1820

ORTOptimizer for the model type Segformer #1820

zachmayer commented Apr 18, 2024

zachmayer Apr 19, 2024

IlyasMoutawwakil Apr 23, 2024

zachmayer Apr 23, 2024

zachmayer Apr 23, 2024

zachmayer Apr 23, 2024

zachmayer Apr 24, 2024

mht-sharma Apr 24, 2024 •

edited

zachmayer Apr 24, 2024

tianleiwu May 2, 2024

zachmayer May 2, 2024

IlyasMoutawwakil left a comment

zachmayer commented Apr 25, 2024

zachmayer commented Apr 25, 2024

zachmayer commented Apr 29, 2024

IlyasMoutawwakil commented Apr 29, 2024 •

edited

zachmayer commented May 1, 2024

ORTOptimizer for the model type Segformer #1820

Are you sure you want to change the base?

ORTOptimizer for the model type Segformer #1820

Conversation

zachmayer commented Apr 18, 2024

What does this PR do?

Before submitting

Who can review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mht-sharma Apr 24, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

zachmayer commented Apr 25, 2024

zachmayer commented Apr 25, 2024

zachmayer commented Apr 29, 2024

IlyasMoutawwakil commented Apr 29, 2024 • edited

zachmayer commented May 1, 2024

mht-sharma Apr 24, 2024 •

edited

IlyasMoutawwakil commented Apr 29, 2024 •

edited