Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-channel input for annotate_with_whisper #865

Open
WangHelin1997 opened this issue Oct 26, 2022 · 8 comments
Open

Multi-channel input for annotate_with_whisper #865

WangHelin1997 opened this issue Oct 26, 2022 · 8 comments

Comments

@WangHelin1997
Copy link

WangHelin1997 commented Oct 26, 2022

Hi,
I found that only mono input is supported now for the annotate_with_whisper function. Will the multi-channel input work in the future? Or I only need to keep the first channel, what should I do? Thanks so much.

            logging.warning(
                f"Skipping recording '{recording.id}'. It has {recording.num_channels} channels, "
                f"but we currently only support mono input."
            )
            continue
@pzelasko
Copy link
Collaborator

I think it should be straightforward to modify the code to iterate over every channel in a recording, create supervisions specifically for that channel, and bind it all together into a MultiCut. Would you be willing to make a PR with these improvements?

@pzelasko
Copy link
Collaborator

CC maybe @desh2608 would also be interested

@desh2608
Copy link
Collaborator

Yeah, that should be the most straightforward approach. If all channels are different instances of the same speech (i.e. same audio recorded with different mics, instead of 2 channels of a telephone conversations, for example), you can also just pick the first channel to transcribe and share the supervision for all channels. This would save a lot of compute.

@WangHelin1997
Copy link
Author

Thanks for your help.
In addition, I found that annotate_with_whisper function takes very long time for mutiple audio files. Have you ever tested the speed of it?

@pzelasko
Copy link
Collaborator

I never optimized it for speed. There is a lot of discussion about it in the Whisper repo, maybe you can find something from there useful. I'd be happy to accept contributions with perf improvements :)

@desh2608
Copy link
Collaborator

desh2608 commented Apr 6, 2023

At some point we may consider using one of the Python wrappers around whisper.cpp instead of the original Whisper to speed things up. At the moment, it seems people have come up with several ways to wrap it in Python, such as with cython, ctypes, and pybind11. They seem promising, but most of them seem to have small issues such as incompatibility with Windows etc.

@entn-at
Copy link
Contributor

entn-at commented Apr 6, 2023

At some point we may consider using one of the Python wrappers around whisper.cpp instead of the original Whisper to speed things up. At the moment, it seems people have come up with several ways to wrap it in Python, such as with cython, ctypes, and pybind11. They seem promising, but most of them seem to have small issues such as incompatibility with Windows etc.

I've actually started working on a second workflow that uses faster-whisper powered by CTranslate2's implementation (see #1017). It's a lot faster and uses far less memory.

@desh2608
Copy link
Collaborator

desh2608 commented Apr 6, 2023

Cool! Would you be interested in making a PR? I think it would be good to fall back to regular Whisper if people don't have the compute capability required for CTranslate2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants