No Speech Detection #27

ZachNagengast · 2024-02-16T22:31:53Z

This can be done with logit filters on the first loop, similar to detecting language. However, this cannot be used when we are using a prefill prompt (i.e. forced decoder tokens) so that will need special handling. Ideally, there'd be an option to ignore the prefill prompt for the first decoder loop to detect no speech, which costs 1 extra loop but may allow skipping the entire window if developers are expecting some long stretches of silence in their input audio.

References

Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L692-L693
WhisperKit inline todo:

WhisperKit/Sources/WhisperKit/Core/TextDecoder.swift

Line 497 in 228630c

noSpeechProb: 0, // TODO: implement no speech prob

WhisperKit/Sources/WhisperKit/Core/WhisperKit.swift

Lines 612 to 616 in 228630c

    
           if let threshold = options.noSpeechThreshold, 
        
              result.noSpeechProb > threshold 
        
           { 
        
               needsFallback = false // silence 
        
           }

ZachNagengast added help wanted Extra attention is needed feature New feature or request good first issue Good for newcomers labels Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Speech Detection #27

No Speech Detection #27

ZachNagengast commented Feb 16, 2024

No Speech Detection #27

No Speech Detection #27

Comments

ZachNagengast commented Feb 16, 2024

References