Quantization Investigation #126

NirantK · 2024-02-20T08:31:02Z

Consider this model from Xenova, there is a quantized model which is 120M instead of the 440-450M which I get from O3 quantization from Optimum.

Compare if the quantized model is as good as the 450M, O3 with an atol of 1e-3 and an O2 of 1e-4 — or there is something else happening there?

NirantK · 2024-02-20T08:37:31Z

NirantK · 2024-02-20T14:34:52Z

From @xenova, this script traverses the graph and collects operators for quantization
https://github.com/xenova/transformers.js/blob/main/scripts/convert.py

NirantK self-assigned this Feb 20, 2024

NirantK added the enhancement New feature or request label May 14, 2024

Provide feedback