No Speech Detection #27
Labels
feature
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
This can be done with logit filters on the first loop, similar to detecting language. However, this cannot be used when we are using a prefill prompt (i.e. forced decoder tokens) so that will need special handling. Ideally, there'd be an option to ignore the prefill prompt for the first decoder loop to detect no speech, which costs 1 extra loop but may allow skipping the entire window if developers are expecting some long stretches of silence in their input audio.
References
Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L692-L693
WhisperKit inline todo:
WhisperKit/Sources/WhisperKit/Core/TextDecoder.swift
Line 497 in 228630c
WhisperKit/Sources/WhisperKit/Core/WhisperKit.swift
Lines 612 to 616 in 228630c
The text was updated successfully, but these errors were encountered: