Quantize distil-whisper? #113

sujitvasanth · 2024-04-06T21:08:20Z

Hi I was wondering if there would be any speed gain and size reduction for quantizing distil whisper?
i.e bits and bytes, Onyx, GPTQ
There is a gain from quantizing the whisper model itself without much quality loss - see here
https://medium.com/@daniel-klitzke/quantizing-openais-whisper-with-the-huggingface-optimum-library-30-faster-inference-64-36d9815190e0

You may wonder why to quantize - I am running several models simultaneously in an AI assistant that uses an LLM (openchat quantized), multimodal visual model (LLAVA or moondream), wakeword model (openwakeword). It will run on my device 24Gb VRAM but I wanted to share with as many users as possible so to keep the VRAM usage low.

I was looking to quantize the large v3 model as it had the lowest word error rate and was the second fastest or perhaps the medium.en model..

can anyone point me in the direction of a quantised version of distil-whisper or how I can generate one and use it for inference?

sujitvasanth changed the title ~~Quatize distil-whisper?~~ Quamtize distil-whisper? Apr 6, 2024

sujitvasanth changed the title ~~Quamtize distil-whisper?~~ Quantize distil-whisper? Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantize distil-whisper? #113

Quantize distil-whisper? #113

sujitvasanth commented Apr 6, 2024 •

edited

Quantize distil-whisper? #113

Quantize distil-whisper? #113

Comments

sujitvasanth commented Apr 6, 2024 • edited

sujitvasanth commented Apr 6, 2024 •

edited