Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantize distil-whisper? #113

Open
sujitvasanth opened this issue Apr 6, 2024 · 0 comments
Open

Quantize distil-whisper? #113

sujitvasanth opened this issue Apr 6, 2024 · 0 comments

Comments

@sujitvasanth
Copy link

sujitvasanth commented Apr 6, 2024

Hi I was wondering if there would be any speed gain and size reduction for quantizing distil whisper?
i.e bits and bytes, Onyx, GPTQ
There is a gain from quantizing the whisper model itself without much quality loss - see here
https://medium.com/@daniel-klitzke/quantizing-openais-whisper-with-the-huggingface-optimum-library-30-faster-inference-64-36d9815190e0

You may wonder why to quantize - I am running several models simultaneously in an AI assistant that uses an LLM (openchat quantized), multimodal visual model (LLAVA or moondream), wakeword model (openwakeword). It will run on my device 24Gb VRAM but I wanted to share with as many users as possible so to keep the VRAM usage low.

I was looking to quantize the large v3 model as it had the lowest word error rate and was the second fastest or perhaps the medium.en model..

can anyone point me in the direction of a quantised version of distil-whisper or how I can generate one and use it for inference?

@sujitvasanth sujitvasanth changed the title Quatize distil-whisper? Quamtize distil-whisper? Apr 6, 2024
@sujitvasanth sujitvasanth changed the title Quamtize distil-whisper? Quantize distil-whisper? Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant