Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization Investigation #126

Open
NirantK opened this issue Feb 20, 2024 · 2 comments
Open

Quantization Investigation #126

NirantK opened this issue Feb 20, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@NirantK
Copy link
Collaborator

NirantK commented Feb 20, 2024

Consider this model from Xenova, there is a quantized model which is 120M instead of the 440-450M which I get from O3 quantization from Optimum.

Compare if the quantized model is as good as the 450M, O3 with an atol of 1e-3 and an O2 of 1e-4 — or there is something else happening there?

@NirantK NirantK self-assigned this Feb 20, 2024
@NirantK
Copy link
Collaborator Author

NirantK commented Feb 20, 2024

See Static & Dynamic quantization here: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization

@NirantK
Copy link
Collaborator Author

NirantK commented Feb 20, 2024

From @xenova, this script traverses the graph and collects operators for quantization
https://github.com/xenova/transformers.js/blob/main/scripts/convert.py

@NirantK NirantK added the enhancement New feature or request label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant