Addressing Whisper STT issues #5929

mamei16 · 2024-04-24T19:50:38Z

Even after the first Whisper STT issue following the big Gradio update was fixed (#5856), multiple others are still present. The purpose of this PR is to address the issues by either implementing workarounds, or ideally fixing the underlying problems, provided they don't originate in Gradio itself.

Issue 1: Chrome only (workaround found)

Reports: #5869, #5920, #5805

Description

Users report the following exception message after recording audio in Chrome: audioop.error: not a whole number of frames.
Looking into the issue, I noticed that the audio data obtained when recording audio with Firefox has the following format:

(44100, array([[0, 0],
       [0, 0],
       [0, 0],
       ...,
       [0, 0],
       [0, 0],
       [0, 0]], dtype=int16))

Where the first tuple item is the sample rate, and the second item is the array of samples. Notice that each sample consists of two values, which I imagine simply means the audio is recorded in stereo.
Now compare this to the audio data obtained when using Chromium:

(44100, array([0, 0, 0, ..., 0, 0, 0], dtype=int16))

In this case, each sample consists of only a single value, perhaps suggesting that audio is recorded in mono instead of stereo. In any case, this different data format causes the aforementioned audioop.error. This PR provides a workaround for this, by simply stacking the sample data column-wise, if is discovered that it is not already a nested numpy array.

Issue 2: Firefox only (Gradio Issue)

Reports: #5920

Description

It is reported that the UI in Firefox is sluggish and even causes the browser to crash after a number of recordings have been made. The source of the problem remains to be identified.

Here's what I have found out so far:
After stopping a recording in Firefox, it seems some JavaScript function is called over and over again indefinitely, each time triggering the error Invalid URI. Load of media resource failed. This leads to excessive CPU usage even when the web UI is idle after recording audio. It appears that after each recording, another asynchronous function starts calling the problematic function, leading to CPU usage increasing further with each recording, until the web UI becomes laggy and Firefox finally crashes.

Update: This error even occurs with a minimal POC Gradio program, so I have created an issue in Gradio's repo: gradio-app/gradio#8135.

Checklist:

I have read the Contributing guidelines.

Merge dev branch

Merge dev branch (oobabooga#5257)

Merge dev branch

TimStrauven · 2024-04-25T11:35:07Z

Hi, I posted
Whisper STT overhaul #5563
a while ago, to address also some of the STT issues and moving away from the speechrecognition lib.
Code there might help to implement this one? (Still needs "audio.stop_recording" like you changed in the other PR, and needs to be extended for other architectures than only cuda and cpu)

mamei16 · 2024-04-25T21:02:08Z

Hi, I posted Whisper STT overhaul #5563 a while ago, to address also some of the STT issues and moving away from the speechrecognition lib. Code there might help to implement this one? (Still needs "audio.stop_recording" like you changed in the other PR, and needs to be extended for other architectures than only cuda and cpu)

Hi, that overhaul definitely contains some nice changes, especially removing the need for the speech-recognition dependency! If you'd like to polish it a little to make it "merge-ready", I propose you make a fork of the main webUI where we can collaborate on that.

oobabooga · 2024-05-19T23:06:15Z

@mamei16 is this PR ready for merging? I see that you managed to put the record button next to the Generate button. I had tried to do that myself in the past and failed, so thanks for that.

mamei16 · 2024-05-21T15:46:24Z

@mamei16 is this PR ready for merging? I see that you managed to put the record button next to the Generate button. I had tried to do that myself in the past and failed, so thanks for that.

Yeah, I think so. I've also created another version based on @TimStrauven's overhaul, but due to some bug it barely recognizes anything (even though it should be more "correct" than this version, since it's closer to what openai is doing).

Firefox still crashes after a number of transcriptions, but unfortunately the Gradio devs have yet to react to my issue in any way, so not much to do there :/

oobabooga added 30 commits December 14, 2023 22:39

Merge pull request oobabooga#4927 from oobabooga/dev

c3e0fcf

Merge dev branch

Merge pull request oobabooga#4937 from oobabooga/dev

443be39

Merge dev branch

Merge pull request oobabooga#4961 from oobabooga/dev

7be0983

Merge dev branch

Merge pull request oobabooga#4980 from oobabooga/dev

b28020a

Merge dev branch

Merge pull request oobabooga#4988 from oobabooga/dev

781367b

Merge dev branch

Merge pull request oobabooga#5002 from oobabooga/dev

71eb744

Merge dev branch

Merge pull request oobabooga#5005 from oobabooga/dev

5b791ca

Merge dev branch

Merge pull request oobabooga#5011 from oobabooga/dev

c1f78db

Merge dev branch

Merge pull request oobabooga#5012 from oobabooga/dev

489f4a2

Merge dev branch

Merge pull request oobabooga#5022 from oobabooga/dev

11288d1

Merge dev branch

Merge pull request oobabooga#5039 from oobabooga/dev

4b25acf

Merge dev branch

Merge pull request oobabooga#5073 from oobabooga/dev

af87609

Merge dev branch

Merge pull request oobabooga#5078 from oobabooga/dev

19d1374

Merge dev branch

Merge pull request oobabooga#5100 from oobabooga/dev

3fd7073

Merge dev branch

Merge pull request oobabooga#5132 from oobabooga/dev

3e3a66e

Merge dev branch

Merge pull request oobabooga#5152 from oobabooga/dev

3f28925

Merge dev branch

Merge pull request oobabooga#5163 from oobabooga/dev

c54d1da

Merge dev branch

Merge pull request oobabooga#5181 from oobabooga/dev

8ea3f31

Merge dev branch

Merge pull request oobabooga#5195 from oobabooga/dev

e169993

Merge dev branch

Merge pull request oobabooga#5199 from oobabooga/dev

ad1ff53

Merge dev branch

Merge pull request oobabooga#5220 from oobabooga/dev

2dc8db8

Merge dev branch

Merge pull request oobabooga#5253 from oobabooga/dev

61e4bfe

Merge dev branch

Merge pull request oobabooga#5266 from oobabooga/dev

d8c3a5b

Merge dev branch (oobabooga#5257)

Merge pull request oobabooga#5347 from oobabooga/dev

1343aa3

Merge dev branch

Merge pull request oobabooga#5348 from oobabooga/dev

837bd88

Merge dev branch

Merge pull request oobabooga#5379 from oobabooga/dev

e7a760e

Merge dev branch

Merge pull request oobabooga#5404 from oobabooga/dev

4f3fdf1

Merge dev branch

Merge pull request oobabooga#5452 from oobabooga/dev

a329db0

Merge dev branch

Merge pull request oobabooga#5453 from oobabooga/dev

0f134bf

Merge dev branch

Merge pull request oobabooga#5496 from oobabooga/dev

dc6adef

Merge dev branch

oobabooga and others added 18 commits February 14, 2024 11:32

Merge pull request oobabooga#5502 from oobabooga/dev

771c592

Merge dev branch

Merge pull request oobabooga#5530 from oobabooga/dev

dd46229

Merge dev branch

Merge pull request oobabooga#5534 from oobabooga/dev

7838075

Merge dev branch

Merge pull request oobabooga#5549 from oobabooga/dev

d6bb6e7

Merge dev branch

Merge pull request oobabooga#5574 from oobabooga/dev

ba85271

Merge dev branch

Merge pull request oobabooga#5617 from oobabooga/dev

60f3d87

Merge dev branch

Merge pull request oobabooga#5641 from oobabooga/dev

992affe

Merge dev branch

Merge pull request oobabooga#5655 from oobabooga/dev

aa0da07

Merge dev branch

Merge pull request oobabooga#5680 from oobabooga/dev

1934cb6

Merge dev branch

Merge pull request oobabooga#5716 from oobabooga/dev

7cf1402

Merge dev branch

Merge pull request oobabooga#5772 from oobabooga/dev

1a7c027

Merge dev branch

Merge pull request oobabooga#5810 from oobabooga/dev

5b91dbb

Merge dev branch

Merge pull request oobabooga#5822 from oobabooga/dev

65099dc

Merge dev branch

Merge pull request oobabooga#5823 from oobabooga/dev

91a7370

Merge dev branch

Merge pull request oobabooga#5848 from oobabooga/dev

26d822f

Merge dev branch

Merge pull request oobabooga#5887 from oobabooga/dev

a4b732c

Merge dev branch

Merge pull request oobabooga#5927 from oobabooga/dev

ad12236

Merge dev branch

add temporary workaround for problem when using Chromium

388eff0

add additional record button next to "Generate" button

8ad2b65

clarify workaround comment

de33a03

mamei16 marked this pull request as ready for review May 21, 2024 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing Whisper STT issues #5929

Addressing Whisper STT issues #5929

mamei16 commented Apr 24, 2024 •

edited

TimStrauven commented Apr 25, 2024 •

edited

mamei16 commented Apr 25, 2024 •

edited

oobabooga commented May 19, 2024

mamei16 commented May 21, 2024

Addressing Whisper STT issues #5929

Are you sure you want to change the base?

Addressing Whisper STT issues #5929

Conversation

mamei16 commented Apr 24, 2024 • edited

Issue 1: Chrome only (workaround found)

Description

Issue 2: Firefox only (Gradio Issue)

Description

Checklist:

TimStrauven commented Apr 25, 2024 • edited

mamei16 commented Apr 25, 2024 • edited

oobabooga commented May 19, 2024

mamei16 commented May 21, 2024

mamei16 commented Apr 24, 2024 •

edited

TimStrauven commented Apr 25, 2024 •

edited

mamei16 commented Apr 25, 2024 •

edited