Multi-channel input for annotate_with_whisper #865

WangHelin1997 · 2022-10-26T13:43:04Z

Hi,
I found that only mono input is supported now for the annotate_with_whisper function. Will the multi-channel input work in the future? Or I only need to keep the first channel, what should I do? Thanks so much.

            logging.warning(
                f"Skipping recording '{recording.id}'. It has {recording.num_channels} channels, "
                f"but we currently only support mono input."
            )
            continue

The text was updated successfully, but these errors were encountered:

pzelasko · 2022-10-26T17:12:11Z

I think it should be straightforward to modify the code to iterate over every channel in a recording, create supervisions specifically for that channel, and bind it all together into a MultiCut. Would you be willing to make a PR with these improvements?

pzelasko · 2022-10-26T17:12:45Z

CC maybe @desh2608 would also be interested

desh2608 · 2022-10-26T18:16:08Z

Yeah, that should be the most straightforward approach. If all channels are different instances of the same speech (i.e. same audio recorded with different mics, instead of 2 channels of a telephone conversations, for example), you can also just pick the first channel to transcribe and share the supervision for all channels. This would save a lot of compute.

WangHelin1997 · 2022-10-26T18:22:35Z

Thanks for your help.
In addition, I found that annotate_with_whisper function takes very long time for mutiple audio files. Have you ever tested the speed of it?

pzelasko · 2022-10-26T18:51:32Z

I never optimized it for speed. There is a lot of discussion about it in the Whisper repo, maybe you can find something from there useful. I'd be happy to accept contributions with perf improvements :)

desh2608 · 2023-04-06T02:13:34Z

At some point we may consider using one of the Python wrappers around whisper.cpp instead of the original Whisper to speed things up. At the moment, it seems people have come up with several ways to wrap it in Python, such as with cython, ctypes, and pybind11. They seem promising, but most of them seem to have small issues such as incompatibility with Windows etc.

entn-at · 2023-04-06T05:40:10Z

At some point we may consider using one of the Python wrappers around whisper.cpp instead of the original Whisper to speed things up. At the moment, it seems people have come up with several ways to wrap it in Python, such as with cython, ctypes, and pybind11. They seem promising, but most of them seem to have small issues such as incompatibility with Windows etc.

I've actually started working on a second workflow that uses faster-whisper powered by CTranslate2's implementation (see #1017). It's a lot faster and uses far less memory.

desh2608 · 2023-04-06T11:32:30Z

Cool! Would you be interested in making a PR? I think it would be good to fall back to regular Whisper if people don't have the compute capability required for CTranslate2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-channel input for annotate_with_whisper #865

Multi-channel input for annotate_with_whisper #865

WangHelin1997 commented Oct 26, 2022 •

edited

pzelasko commented Oct 26, 2022

pzelasko commented Oct 26, 2022

desh2608 commented Oct 26, 2022

WangHelin1997 commented Oct 26, 2022

pzelasko commented Oct 26, 2022

desh2608 commented Apr 6, 2023

entn-at commented Apr 6, 2023

desh2608 commented Apr 6, 2023

Multi-channel input for annotate_with_whisper #865

Multi-channel input for annotate_with_whisper #865

Comments

WangHelin1997 commented Oct 26, 2022 • edited

pzelasko commented Oct 26, 2022

pzelasko commented Oct 26, 2022

desh2608 commented Oct 26, 2022

WangHelin1997 commented Oct 26, 2022

pzelasko commented Oct 26, 2022

desh2608 commented Apr 6, 2023

entn-at commented Apr 6, 2023

desh2608 commented Apr 6, 2023

WangHelin1997 commented Oct 26, 2022 •

edited