Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set the target language for examples in README? #130

Open
clstaudt opened this issue May 1, 2024 · 7 comments
Open

How to set the target language for examples in README? #130

clstaudt opened this issue May 1, 2024 · 7 comments

Comments

@clstaudt
Copy link

clstaudt commented May 1, 2024

The code examples in the README do not make it obvious how to set the language of the audio to transcribe.

The default settings create garbled english text if the audio language is different.

@CheshireCC
Copy link

it seams that this model only output English subtitles.

@clstaudt
Copy link
Author

clstaudt commented May 5, 2024

@CheshireCC If that is the case, would it be a distilled version of Whisper?

"Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. "

https://openai.com/index/whisper

@CheshireCC
Copy link

@clstaudt

maybe distilled version requires re-training of the model,just like fine-tuning a model.
https://github.com/huggingface/distil-whisper#:~:text=Note%3A%20Distil,checkpoints%20when%20ready!

"Note: Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill 
Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. 
We will soon update the repository with multilingual checkpoints when ready!"

@sanchit-gandhi
Copy link
Collaborator

sanchit-gandhi commented May 20, 2024

Indeed - as @CheshireCC has mentioned, you can train your own multilingual distil-whisper checkpoint according to the training readme. This has been done successfully in a number of languages, such as for French and German.

Also cc @eustlb having done some extensive experimentation into French distillation.

@eustlb
Copy link
Collaborator

eustlb commented May 22, 2024

Hey @clstaudt @CheshireCC, indeed distil-large-v3 has been trained to do English-only transcriptions. More details about motivations here.

@clstaudt
Copy link
Author

clstaudt commented May 22, 2024

Thanks for clarifying @eustlb. I'm about to give a presentation praising the potential of distillation with distil-whisper as the prime example. While the speedup is impressive, I think it's important to add that it's just one language while the teacher model was multilingual. What do you think will be the speedup and size reduction for a multilingual distil-whisper?

@eustlb
Copy link
Collaborator

eustlb commented May 22, 2024

Thanks for promoting distil-whisper, @clstaudt!

Actually, you can find the info about this here on the README and here on the model card, but thanks for mentioning it! It may not be clear enough.

Concerning the multilingual distilled Whisper, it is a very difficult question to answer without proper experimentation, and I prefer not to give false insights. There are a lot of factors to take into account (e.g., number of languages, dataset sizes, etc.). Yet, I would say that were you to have large enough datasets for a few languages and manage to get good results with a 4-layers decoder, the size reduction would be 48%—an exact value—(compared to 51% for a 2-layers decoder) and the speed-up should be around 5.5x—a rough estimation, to be taken with a big pinch of salt—(compared to 6.3x for a 2-layers decoder).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants