New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Language Specific Transcription Using txtai and Whisper #593
Comments
It's possible Whisper runs the translation task by default. Here's an idea to test out using code from the model page. from transformers import WhisperProcessor
from txtai.transcription import Transcription
transcribe = Transcription("openai/whisper-large-v2")
# Test transcribe only
transcribe.pipeline.model.config.forced_decoder_ids = WhisperProcessor.get_decoder_prompt_ids(language="polish", task="transcribe")
for text in transcribe(files):
print(text) If that works, I can add in a change that makes this more streamlined. |
@davidmezzetti thank you for help, after small mod this code works fine from transformers import WhisperProcessor
from txtai.pipeline import Transcription
# from txtai.transcription import Transcription
# model = "openai/whisper-large-v2"
model = "bardsai/whisper-large-v2-pl-v2"
transcribe = Transcription(model)
processor = WhisperProcessor.from_pretrained(model)
# Test transcribe only
transcribe.pipeline.model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="polish", task="transcribe")
for text in transcribe(files):
print(text) |
Thanks for confirming. I'll keep this issue open and add an argument to disable automatic translation. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
Description: Linux Mint 21.2
Release: 21.2
Codename: victoria
Description
I'm attempting to transcribe Polish audio using the Whisper model within txtai, and while I am able to get transcriptions, they appear to be in English rather than the native language of the audio.
Here's a snippet of the code I'm using:
Questions
Any guidance or suggestions on this matter would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered: