-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-channel input for annotate_with_whisper #865
Comments
I think it should be straightforward to modify the code to iterate over every channel in a recording, create supervisions specifically for that channel, and bind it all together into a |
CC maybe @desh2608 would also be interested |
Yeah, that should be the most straightforward approach. If all channels are different instances of the same speech (i.e. same audio recorded with different mics, instead of 2 channels of a telephone conversations, for example), you can also just pick the first channel to transcribe and share the supervision for all channels. This would save a lot of compute. |
Thanks for your help. |
I never optimized it for speed. There is a lot of discussion about it in the Whisper repo, maybe you can find something from there useful. I'd be happy to accept contributions with perf improvements :) |
At some point we may consider using one of the Python wrappers around whisper.cpp instead of the original Whisper to speed things up. At the moment, it seems people have come up with several ways to wrap it in Python, such as with cython, ctypes, and pybind11. They seem promising, but most of them seem to have small issues such as incompatibility with Windows etc. |
I've actually started working on a second workflow that uses faster-whisper powered by CTranslate2's implementation (see #1017). It's a lot faster and uses far less memory. |
Cool! Would you be interested in making a PR? I think it would be good to fall back to regular Whisper if people don't have the compute capability required for CTranslate2. |
Hi,
I found that only mono input is supported now for the
annotate_with_whisper
function. Will the multi-channel input work in the future? Or I only need to keep the first channel, what should I do? Thanks so much.The text was updated successfully, but these errors were encountered: