Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Speech Detection #27

Open
ZachNagengast opened this issue Feb 16, 2024 · 0 comments
Open

No Speech Detection #27

ZachNagengast opened this issue Feb 16, 2024 · 0 comments
Labels
feature New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ZachNagengast
Copy link
Contributor

This can be done with logit filters on the first loop, similar to detecting language. However, this cannot be used when we are using a prefill prompt (i.e. forced decoder tokens) so that will need special handling. Ideally, there'd be an option to ignore the prefill prompt for the first decoder loop to detect no speech, which costs 1 extra loop but may allow skipping the entire window if developers are expecting some long stretches of silence in their input audio.

References

Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L692-L693
WhisperKit inline todo:

noSpeechProb: 0, // TODO: implement no speech prob

if let threshold = options.noSpeechThreshold,
result.noSpeechProb > threshold
{
needsFallback = false // silence
}

@ZachNagengast ZachNagengast added help wanted Extra attention is needed feature New feature or request good first issue Good for newcomers labels Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
Status: In Progress
Development

No branches or pull requests

1 participant