"greedy_batched" methods should support "partial_hypotheses" option #9040

galv · 2024-04-25T15:22:15Z

Is your feature request related to a problem? Please describe.

I've been experimenting with examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py. One of the things I've noticed is that the "greedy_batched" strategy does not support partial hypotheses. We should add support for this. Right now, streaming of RNN-T models is horrendously slow because we are running the decoder at batch size 1, because we must use the "greedy" strategy when doing streaming. The encoder basically isn't meaningfully contributing to the runtime. The decoder is the main slowdown.

FYI @artbataev .

The text was updated successfully, but these errors were encountered:

galv assigned okuchaiev, galv and artbataev and unassigned okuchaiev Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"greedy_batched" methods should support "partial_hypotheses" option #9040

"greedy_batched" methods should support "partial_hypotheses" option #9040

galv commented Apr 25, 2024

"greedy_batched" methods should support "partial_hypotheses" option #9040

"greedy_batched" methods should support "partial_hypotheses" option #9040

Comments

galv commented Apr 25, 2024