Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English text normalization utilization for Eager Streaming Mode #111

Open
atiorh opened this issue Apr 8, 2024 · 1 comment
Open

English text normalization utilization for Eager Streaming Mode #111

atiorh opened this issue Apr 8, 2024 · 1 comment

Comments

@atiorh
Copy link
Contributor

atiorh commented Apr 8, 2024

  • Eager Streaming Mode relies on confirming the currently predicted text tokens with at least 1 redundant historical prediction.
  • Whisper is susceptible to outputting tokens that trivially differ (e.g. "gonna" vs "going to", "amortisation" vs "amortization") for almost identical audio input. This happens occasionally and causes unnecessary slowdown due to missed opportunities to confirm predicted text tokens earlier.
  • Memory and Latency Regression Tests #99 implements English Text Normalization which can be integrated into the token confirmation logic in Eager Streaming Mode to avoid these unnecessary slowdowns.
  • Note that this would not intervene in the actually predicted tokens and the associated KV cache. This only changes the criterion for confirmation in "near matches with a trivial string variation".
@ZachNagengast ZachNagengast linked a pull request May 7, 2024 that will close this issue
4 tasks
@ZachNagengast ZachNagengast removed a link to a pull request May 7, 2024
4 tasks
@ZachNagengast
Copy link
Contributor

Utilities to help with this will be included with #120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants