Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving int8 quantization results. #3865

Open
severecoder opened this issue May 15, 2024 · 3 comments
Open

Improving int8 quantization results. #3865

severecoder opened this issue May 15, 2024 · 3 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@severecoder
Copy link

I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

The end result is to have .trt or engine file inferencing at int8 precision with best possible detection metrics.

TIA

@zerollzeng
Copy link
Collaborator

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

pytorch_quantizaton will be deprecated, please use AMMO now.

@zerollzeng zerollzeng self-assigned this May 17, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 17, 2024
@severecoder
Copy link
Author

Thank for the response, isn't ammo only limited to LLMs?

@brb-nv
Copy link
Collaborator

brb-nv commented May 28, 2024

There's also support for diffuser models. [link]

Btw, AMMO has been renamed to TensorRT Model Optimizer. [reference]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants