-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whisper batch generation is not faster than loops #1648
Comments
Absolutely! So glad you asked! lol. Ctranslate2 actually does support true batching, but at the C++ level. I'll give you my repository that uses it via the amazing However, if you use my repo for sample scripts and keep the versioning the same, you should be fine. I have a lot of experience with https://github.com/BBC-Esq/WhisperS2T-transcriber ...and the amazing... https://github.com/shashikg/WhisperS2T At ~150 stars it flies under the radar...but yet it beats huggingface's "insanely" (hate that name) implementation of Whisper that has thousands of stars. Just goes to show how many stars the stereotypical huggingface repo gets is NOT AT ALL related to the quality of their product, but is more boosted by marketing and networking buddy referrals...Give credit where credit is due is what I say. Try |
BTW, just haven't had the time to update my whispers2t batch repo with this bad boy so stay tuned. ;-) It allows you to specify task, choose any ctranslate2 quantization you want, process all sub-directories recursively, exclude certain file extensions from being processed, change beam size, batch size (courtesy of WhisperS2T), and so on. |
Last post I promise...but here's my analysis of This is supposed to be fixed, however, per this discussion: Anyways, expand below to see my analysis of the library (not most current version, however): MY PERSONAL SUMMARY
|
Thank you for telling me about WhisperS2T. I'll take a look later. Currently I'm not using |
Whisper S2T uses ctransalate 2 directly basically. |
In CTranslate2 Whisper model, batch generate is not faster than looping one by one. I tried the same thing on Translator model and it shows batching is far superior (a lot faster). I used Whisper small converted to int8 using ct2 tool. Also, GPU memory is higher when batching so I thought CTranslate2 is doing "proper" batching (and not a looping wrapper). Here is my simple Whisper code.
When I ran the code on colab (T4 GPU), it outputs:
Is there anything I could do to increase the speed of Whisper batch generation?
The text was updated successfully, but these errors were encountered: