New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORTOptimizer for the model type Segformer #1820
base: main
Are you sure you want to change the base?
Conversation
optimum/utils/normalized_config.py
Outdated
if attr_value is None: | ||
raise AttributeError(f"Attribute {self.NUM_ATTENTION_HEADS} not found in config") | ||
if isinstance(attr_value, list): | ||
return max(attr_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with max here, but if needed I can try to figure out how to implement list support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to return the list as is. and it's the exporter's job to handle it.
@mht-sharma do you know what hidden_size
or num_attention_heads
can be used for with image classification models ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I remove the max
and run pytest tests/onnxruntime/test_optimization.py -k test_compare_original_image_model_with_optimized_model -vv
, I get the following error:
../../venvs/optimum/lib/python3.11/site-packages/onnxruntime/transformers/optimizer.py:178: in optimize_by_fusion
optimizer = optimizer_class(model, num_heads, hidden_size)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <onnx_model_bert.BertOnnxModel object at 0x16fbebf50>, model = ir_version: 7
opset_import {
version: 12
}
opset_import {
domain: "com.microsoft.experimental"
version: 1
}
opse...m_param: "batch_size"
}
dim {
dim_value: 150
}
}
}
}
}
}
, num_heads = [1, 2, 5, 8], hidden_size = 256
def __init__(self, model: ModelProto, num_heads: int = 0, hidden_size: int = 0):
"""Initialize BERT ONNX Model.
Args:
model (ModelProto): the ONNX model
num_heads (int, optional): number of attention heads. Defaults to 0 (detect the parameter automatically).
hidden_size (int, optional): hidden dimension. Defaults to 0 (detect the parameter automatically).
"""
> assert (num_heads == 0 and hidden_size == 0) or (num_heads > 0 and hidden_size % num_heads == 0)
E TypeError: '>' not supported between instances of 'list' and 'int'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the onnxruntime
optimizer expects num_attention_heads
to be an int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe microsoft/onnxruntime#17254 is related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So Microsoft hasn't tested the optimizer with segformers yet: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/README.md#supported-models
They expect both num_heads and hidden_size to be ints:
num_heads (int, optional): number of attention heads. Defaults to 0.
0 allows detect the parameter from graph automatically.
hidden_size (int, optional): hidden size. Defaults to 0.
0 allows detect the parameter from graph automatically.
If I convert model_quantized.config.hidden_sizes
and model_quantized.config.num_attention_heads
to integers by taking the max, the optimizer works and does seem to change the graph of the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zachmayer, curious to know which part of the model / encoder block is optimised. Since you gave max values I suppose the last one in the hierarchy? Or all encoder blocks are optimised?
If first, then it could be worth to try out having a loop on the optimiser for each pair of hidden size and
attention head and see if all the blocks are optimised?
So this model may not be supported by onnxruntime itself.
By this I was referring to the onnxruntime optimiser support, since the optimiser does not take list values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I tell which blocks get optimized?
I just pushed a commit that uses sum instead of max, and uses the ORTModelForSemanticSegmentation
in the test, and the test passes.
I'll try some different parameters and see what changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_heads and hidden_size is optional for onnxruntime transformers optimizer. If you are not sure, set them to 0.
You can use Netron to view the onnx model before/after optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tianleiwu — ok! I just pushed a new commit where I set them to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left few comments.
@mht-sharma @IlyasMoutawwakil the optimizer is definitely doing something to the model. I tested it on the vikp/surya_layout segformer, where I exported the original model to onnx, optimized it, and then quantized the optimized model and counted the number of nodes in the graph: Original model graph: 2900
Optimized model graph: 1263
Quantized model graph: 1709 The optimizer definitely prunes nodes from the graph and the resulting model is faster for inference when I test it. |
I tried 3 ways of handling the lists:
Sum/Max yield pretty similar results, so I went with sum. 0 did not seem to work well. The tests pass on the PR now, and when I test this optimizer on a real segformer, it definitely makes changes to the model graph. |
@mht-sharma @IlyasMoutawwakil what do you think? The tests pass when I run them locally, and the optimizer seems to be able to reduce the size of the model a lot. (Almost 60%) |
@zachmayer to visualize the graphs you can use https://netron.app/
which are the max values (or last in the lists).
I don't see any optimizations, @mht-sharma any idea which operators we should be looking for ? |
huh. I also tried using 0, which the docs said would infer the number of heads based on the model graph. I changed from In my testing the optimize model definitely has a smaller graph and faster inference. So the optimizer is doing something to the model. |
What does this PR do?
Adds the segformer model to ORTOptimizer. Based on the advice I got in #1761, but I decided to start with segformer.
Fixes # (issue)
Before submitting
Who can review?
@mht-sharma maybe?