Framework / Tool | Source code |
---|---|
pytorch | pytorch |
tensorflow | tf |
- speed up 2x
- For CPU, optimized graph is slightly different: FastGelu is replaced by BiasGelu.
- Note that ONNX Runtime is compatible with Python versions 3.5 to 3.7.
- tackled optimizing một model cho các môi trường (cloud GPU, desktop CPU,..) tốn nhiều thời gian
python convert_onnx.py
python bert_onnxruntime.py
- sometime OnnxRuntime cannot be fully optimized:
- new subgraph generated by new export tool and not covered by older version of OnnxRuntime
- exported model uses dynamic axis, make harder for shape inference
- some optimization is better to done offline. Like change input tensor type from float32 to float16 avoid Cast nodes to achieve better performance in V100 and T4 GPU
python experiment.py