Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--diarize flag is unreliable #216

Closed
savchenko opened this issue Dec 2, 2022 · 2 comments
Closed

--diarize flag is unreliable #216

savchenko opened this issue Dec 2, 2022 · 2 comments
Labels
duplicate This issue or pull request already exists

Comments

@savchenko
Copy link

savchenko commented Dec 2, 2022

Windows binary is from https://github.com/ggerganov/whisper.cpp/actions/runs/3596200207 ( 061fc81 )

I have an audio of two speakers having a conversation split between left and right channels. There is no echo, audio bleed or so.

In the example below, 2nd line has sentences said by two separate speakers labelled as "speaker 1". In reality, Speaker 1 has finished with "...what the website is" and the next sentence, starting with "Because there's like..." belongs to the Speaker 0.

[00:24:18.160 --> 00:24:24.400]  (speaker 1) XXXXXXXXX can do XXXXXXXXX these things. And then also once they do machine learning stuff,
--[ this line ]--> [00:24:24.400 --> 00:24:30.800]  (speaker 1) it's basically what the website is. Because there's like our capabilities include XXXXXXXXX,
[00:24:30.800 --> 00:24:40.720]  (speaker 0) site analysis, and then installing PyTorch. Well, I do remember one thing that he has

Is there any other information you might need to localise the bug?

@savchenko
Copy link
Author

Waveform screenshot to check the separation:

image

@ggerganov ggerganov added the duplicate This issue or pull request already exists label Dec 2, 2022
@ggerganov
Copy link
Owner

Yes, this is expected.
The implemented strategy is super basic and it cannot be expected to always work reliably.
In this case it fails because a single text segment contains speech by both speakers, while the strategy assumes it will be only one speaking (#64 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants