-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressing Whisper STT issues #5929
base: dev
Are you sure you want to change the base?
Conversation
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch (oobabooga#5257)
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Merge dev branch
Hi, I posted |
Hi, that overhaul definitely contains some nice changes, especially removing the need for the speech-recognition dependency! If you'd like to polish it a little to make it "merge-ready", I propose you make a fork of the main webUI where we can collaborate on that. |
@mamei16 is this PR ready for merging? I see that you managed to put the record button next to the Generate button. I had tried to do that myself in the past and failed, so thanks for that. |
Yeah, I think so. I've also created another version based on @TimStrauven's overhaul, but due to some bug it barely recognizes anything (even though it should be more "correct" than this version, since it's closer to what openai is doing). Firefox still crashes after a number of transcriptions, but unfortunately the Gradio devs have yet to react to my issue in any way, so not much to do there :/ |
Even after the first Whisper STT issue following the big Gradio update was fixed (#5856), multiple others are still present. The purpose of this PR is to address the issues by either implementing workarounds, or ideally fixing the underlying problems, provided they don't originate in Gradio itself.
Issue 1: Chrome only (workaround found)
Reports: #5869, #5920, #5805
Description
Users report the following exception message after recording audio in Chrome:
audioop.error: not a whole number of frames
.Looking into the issue, I noticed that the audio data obtained when recording audio with Firefox has the following format:
Where the first tuple item is the sample rate, and the second item is the array of samples. Notice that each sample consists of two values, which I imagine simply means the audio is recorded in stereo.
Now compare this to the audio data obtained when using Chromium:
In this case, each sample consists of only a single value, perhaps suggesting that audio is recorded in mono instead of stereo. In any case, this different data format causes the aforementioned
audioop.error
. This PR provides a workaround for this, by simply stacking the sample data column-wise, if is discovered that it is not already a nested numpy array.Issue 2: Firefox only (Gradio Issue)
Reports: #5920
Description
It is reported that the UI in Firefox is sluggish and even causes the browser to crash after a number of recordings have been made. The source of the problem remains to be identified.
Here's what I have found out so far:
After stopping a recording in Firefox, it seems some JavaScript function is called over and over again indefinitely, each time triggering the error
Invalid URI. Load of media resource failed.
This leads to excessive CPU usage even when the web UI is idle after recording audio. It appears that after each recording, another asynchronous function starts calling the problematic function, leading to CPU usage increasing further with each recording, until the web UI becomes laggy and Firefox finally crashes.Update: This error even occurs with a minimal POC Gradio program, so I have created an issue in Gradio's repo: gradio-app/gradio#8135.
Checklist: