Real-time transcription on Raspberry Pi 4 #166

ggerganov · 2022-11-21T16:50:15Z

ggerganov
Nov 21, 2022
Maintainer

It is possible to some extend to run Whisper in real-time mode on an embedded device such as the Raspberry Pi.
Below are a few examples + build instructions.

Real-time with 4 seconds step

./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 4 -ac 512

whisper-raspberry-2.mp4

Real-time with 7.5 seconds step

./stream -m models/ggml-tiny.en.bin --step 7680 --length 15360 -c 0 -t 3 -ac 512

whisper-raspberry-3.mp4

Build instructions

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make -j stream
./models/download-ggml-model.sh tiny.en

More information

In order to speed-up the processing, the Encoder's context is reduced from the original 1500 down to 512 (using the -ac 512 flag). This allows to run the above examples on a Raspberry Pi 4 Model B (2018) on 3 CPU threads using the tiny.en Whisper model. The attached microphone is from a USB camera, so not great quality.

More detailed discussion can be found in this issue: #7

Explanation of what the -ac flags does: #137

eternitybt · 2023-01-22T18:13:25Z

eternitybt
Jan 22, 2023

Fantastic work!! On my RPi 4 with Raspberry Pi OS Lite (bullseye) installed, I had to run

sudo apt install libsdl2-dev

or the compiler would complain about a missing "SDL.h" header file. Now it works like a charm.

1 reply

dragen1860 May 18, 2023

I tried many approaches but still can not install libsdl2-dev on ubuntu2204. finally, i try to build the lib sdl2 from source code and it succeeded!

SethRobinson · 2023-04-21T05:07:46Z

SethRobinson
Apr 21, 2023

Thanks, worked pretty well on the pi for me! ( twitter clip ) Yeah, I had to install libsdl2-dev as @eternitybt mentioned as well. (for the mic, I used a ReSpeaker Mic Array v2)

1 reply

salmanfarisvp Nov 6, 2023

Hi @SethRobinson To use the ReSpeaker Mic Array, What are the changes we need to make? Thanks.

solarsamuel · 2023-05-14T23:41:25Z

solarsamuel
May 14, 2023

Made a video of the install here: https://youtu.be/caaKhWcfcCY

0 replies

Bucknalla · 2023-05-15T15:08:25Z

Bucknalla
May 15, 2023

Hi, what is version Pi4 are you using? Is there a minimum memory requirement? I'm getting a 'Illegal instruction (core dumped)' when I try this on a 1GB Pi4.

3 replies

Bucknalla May 15, 2023

I think this now might be because I'm on the 64-bit version of the OS, might be worth confirming that these instructions are compiled for armv8 and not aarch64 which doesnt seem to be working right now for the Raspberry Pi 4, I believe?

eternitybt May 15, 2023

I'm running this on a 2GB Pi4 with 64Bit Raspberry Pi OS Lite. Definitely make sure you are compiling this for ARM architecture.

eternitybt May 15, 2023

Also, one should use the raspberry branch (after cloning the repository, type git checkout raspberry).

Antexo · 2023-07-07T19:24:09Z

Antexo
Jul 7, 2023

can this be done on Raspberry? pi 3b+ model? I wanted to use is in speech recognition

2 replies

solarsamuel Jul 8, 2023

Give it a shot and let us know your results. As long as you have a USB mic and the Pi, it takes like 10 minutes to test. Probably would be best to start with the tiny model.

ilka1999 Nov 29, 2023

It works on my pi 3b+, ubuntu 22.04

nabontra · 2023-07-13T06:21:17Z

nabontra
Jul 13, 2023

Hello, I followed the build instructions on a Pi4 model B and am receiving this error: "fatal error: immintrin.h: No such file or directory" when attempting the make/build.

6 replies

nabontra Jul 13, 2023

Hi, I'm running Raspbian Bullseye 11 (64 bit). I have a 4GB ram pi. Thanks for reaching out!

solarsamuel Jul 14, 2023

Nick, are you getting the error when compiling (make) stream.cpp, or bench.cpp, or main.cpp, quantize.cpp or all of them? When did you clone the github whisper repository? I wonder if I can zip my version which works and send it to you without the models.

nabontra Jul 14, 2023

After git cloning the repo and changing directories, I receive the error when attempting the initial "make -j stream".

Here is the verbose error:

`I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: aarch64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
I LDFLAGS:
I CC: cc (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110
I CXX: g++ (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native -c whisper.cpp -o whisper.o
ggml.c:296:10: fatal error: immintrin.h: No such file or directory
296 | #include <immintrin.h>
| ^~~~~~~~~~~~~
compilation terminated.`

solarsamuel Jul 14, 2023

Here's what I'm seeing. When did you download and install? Today? You downloading to your desktop?

Only difference I see is the +rpi1 here --> Raspbian 10.2.1-6**+rpi1**)

When I tried to find immintrin.h
pi@raspberrypi:/ $ sudo find / -name immintrin.h

This is what is gave me. What happens if you try this?
find: ‘/run/user/1000/gvfs’: Permission denied

I retried the install today and got it to work. Tried a different method this time and downloaded using the green code button --> zip folder at the top of the main page: https://github.com/ggerganov/whisper.cpp. Unzipped to the desktop then compiled using the make command and it worked.

i@raspberrypi:~/Desktop/whisper.cpp-master $ make -j stream
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: aarch64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
I LDFLAGS:
I CC: cc (Debian 10.2.1-6) 10.2.1 20210110
I CXX: g++ (Debian 10.2.1-6) 10.2.1 20210110

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native -c ggml.c -o ggml.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native examples/stream/stream.cpp examples/common.cpp examples/common-ggml.cpp examples/common-sdl.cpp ggml.o whisper.o -o stream sdl2-config --cflags --libs

nabontra Jul 14, 2023

Hi there, I originally git cloned just before my last update (around 24 hrs ago) and have been putting it in a Desktop folder. I tried a sudo find and received the same "Permission denied" message that you did.

I just tried the green button zip/unpack method and still received the immintrin.h error when running make, unfortunately.

I'm not tied to Raspbian, so maybe I'll try your Debian distro to see if the issue persists! The new Raspberry Pi imager tool removes the default root account, but I did make sure my account was root and still received the issue.

Update: I tried fresh installs of Debian 'Buster' and 'Bullseye', receiving the same error each time. I did some searching and the +rpi1 notation could be cross-compiler vs native compiler on my end. What image/distro did you flash your Pi with?

solarsamuel · 2023-07-18T16:31:13Z

solarsamuel
Jul 18, 2023

I typed the following in terminal: pi@raspberrypi:~ $ uname -a

Linux raspberrypi 6.1.29-v8+ #1652 SMP PREEMPT Wed May 24 14:46:55 BST 2023 aarch64
GNU/Linux

Also typed:

pi@raspberrypi:~ $ cat /etc/os-release

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Used the Raspberry Pi Imager v1.7.4 with Pi OS 64 bit, Debian Bullseye Desktop

1 reply

nabontra Jul 23, 2023

Sorry for the delay, this notification didn't pop up for me! Thanks so much for the output, I went back and found a possible issue with the version of Pi Imager that I was using. It seems like it flashed a 32 bit Debian despite listing a 64 bit version, and I missed it. Trying the reflash now!

Edit: That solved it! Initially received an SDL error, but that was resolved by installing libsdl2-dev!

zlw235789 · 2023-09-04T11:13:10Z

zlw235789
Sep 4, 2023

I am running on Raspberry Pi 4b and I can record through ffmpeg, but Stream has no output

:

ffmpeg -f pulse -i alsa_input.usb-C-Media_Electronics_Inc._USB_PnP_Sound_Device-00.analog-mono -ar 16000 -ac 1 recording.wav

root@a0f34bc2c254:/whisper-cpp/whisper.cpp# ./stream -m ./models/ggml-tiny.bin -t 6 --step 0 --length 30000 -vth 0.6
init: found 1 capture devices:
init: - Capture device #0: 'USB PnP Sound Device 模拟单声道'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
whisper_init_from_file_no_state: loading model from './models/ggml-tiny.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1
whisper_model_load: mem required = 201.00 MB (+ 3.00 MB per decoder)

whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 73.62 MB

whisper_model_load: model size = 73.54 MB
whisper_init_state: kv self size = 2.62 MB

whisper_init_state: kv cross size = 8.79 MB

main: processing 0 samples (step = 0.0 sec / len = 30.0 sec / keep = 0.0 sec), 6 threads, lang = en, task = transcribe, timestamps = 1 ...
main: using VAD, will transcribe on speech activity

[Start speaking]

0 replies

solarsamuel · 2023-09-04T17:25:56Z

solarsamuel
Sep 4, 2023

Did you try the default example from above?

./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 4 -ac 512

I wasn't able to get your code to work either.

./stream -m ./models/ggml-tiny.bin -t 6 --step 0 --length 30000 -vth 0.6

Try taking the default example and add -vth 0.6 to the end for the voice activation detector (VAD) like below. Worked well for me.

./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 4 -ac 512 -vth 0.6

Also, the line below works with 6 threads which surprised me, because I thought the Raspberry Pi 4 could go up to 4 threads because it has 4 cores. In the task manager, the CPU usage would sometimes throttle up to near 100% when using either -t 4 or -t 6

./stream -m models/ggml-tiny.en.bin --step 4000 --length 8000 -c 0 -t 6 -ac 512 -vth 0.6

I turned the -step down to 0 like below and it worked once then stopped working.

./stream -m models/ggml-tiny.en.bin --step 0 --length 8000 -c 0 -t 6 -ac 512 -vth 0.6

From what I've seen, upping the --step to 2000 works better and 4000 even better.

0 replies

bobqianic · 2023-09-05T07:06:11Z

bobqianic
Sep 5, 2023
Collaborator

The Raspberry Pi 4 is a bit slow, but some development boards equipped with the RK3588 chip have a 6 TOPS NPU. We should consider supporting these chips, as they could potentially enable "real" real-time transcription. @ggerganov

2 replies

solarsamuel Nov 19, 2023

Works well on Orange Pi 5 with the RK3588S chip. Video here: https://www.youtube.com/watch?v=qgF4_moXcYQ

bobqianic Dec 24, 2023
Collaborator

The most recent update can be found in #1557

micartey · 2023-11-19T18:13:28Z

micartey
Nov 19, 2023

a bit slow

Well, it takes several tens of seconds for a 3 second long wav file...
That is more than just a bit slow. This is, in fact, so slow, that this is neither real-time nor usable...

2 replies

solarsamuel Nov 19, 2023

See the top of this page. It is about real-time transcription on raspberry pi 4. Nothing to do with wav files.

This is real-time and usable.

micartey Nov 21, 2023

There are several types of "real time" but this is none of those. With or without files, it takes ages...

solarsamuel · 2023-12-12T01:03:34Z

solarsamuel
Dec 12, 2023

Whisper is working on the Raspberry Pi 5, up to the small model. Video here: https://youtu.be/W39teHesXus

4 replies

ggerganov Dec 13, 2023
Maintainer Author

Very nice demonstration! The Pi 5 looks very powerful

The quantum models speed-up only the Decoder, but the Encoder actually becomes a bit slower. So overall, we don't expect the quantum models to be faster - they have limited applications.
Hope to demonstrate soon!

solarsamuel Dec 14, 2023

Thank you @ggerganov. Just a heads up, I'm working on a Pi 5 voice assistant project below. It will turn GPIO outputs on and off via special phrases. I should have a video to share in the next few weeks.

https://github.com/solarsamuel/pi5_whisper_voice_assistant

I did a fresh whisper.cpp install today on my Pi 4 (testing for backward compatibility) and all the CPU's maxed at at 100% in gnome system monitor when I streamed with the tiny.en.bin model. This might be why there are some comments of frustration above from September. I noticed a few changes to stream.cpp like wave file stuff, but I'm not sure if this is the issue. Can you give the Pi 4 a shot with the latest whisper.cpp install and see how it runs? Does it max out for you?

ggerganov Dec 14, 2023
Maintainer Author

The new versions by default use 5 beams and 5 "best of" to match the reference Whisper implementation. This makes the decoding slower but more accurate. When you run on RPi, you might want to reduce these numbers and / or disable fallbacks all together.

Looking forward to the voice assistant project!

solarsamuel Dec 22, 2023

I figured out why I had issues with my Pi 4 a few weeks ago. I had 2 instances of Whisper running at the same time. 1 started automatically at bootup. Once I fixed this it worked fine.

Here's a video of the voice assistant project running on a Raspberry Pi 5: https://youtu.be/jpW9foRIwv0

Use your voice and special phrases to turn outputs on and off. Turn on relays, buzzers, motors, lights, etc... Make your own special phrases. All speech-to-text is done with the Whisper C++ models on-device. IO is triggered with the GPIOD library. This is backwards compatible with Raspberry Pi 4.

scign · 2023-12-28T02:12:03Z

scign
Dec 28, 2023

I managed to get main working with wav files, but I am running out of memory with a large sound file. stream is looking like a promising alternative but I'm struggling with the implementation. I used snd-aloop which created two loopback devices. I'm assuming I can run stream with some parameters in one terminal and aplay with some parameters in another but I haven't been able to get it to work. Can someone advise me on what parameters I should be using to make this work, or if I need a different approach?

Output from aplay -l:

**** List of PLAYBACK Hardware Devices ****
card 0: Headphones [bcm2835 Headphones], device 0: bcm2835 Headphones [bcm2835 Headphones]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 1: vc4hdmi [vc4-hdmi], device 0: MAI PCM i2s-hifi-0 [MAI PCM i2s-hifi-0]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 2: Loopback [Loopback], device 0: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 2: Loopback [Loopback], device 1: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7

Output from arecord -l:

**** List of CAPTURE Hardware Devices ****
card 2: Loopback [Loopback], device 0: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7
card 2: Loopback [Loopback], device 1: Loopback PCM [Loopback PCM]
  Subdevices: 8/8
  Subdevice #0: subdevice #0
  Subdevice #1: subdevice #1
  Subdevice #2: subdevice #2
  Subdevice #3: subdevice #3
  Subdevice #4: subdevice #4
  Subdevice #5: subdevice #5
  Subdevice #6: subdevice #6
  Subdevice #7: subdevice #7

For example, if I use aplay -D hw:2,0,0 samples/jfk.wav, then what should I pass to stream to capture from that device?

I tried ./whisper.cpp/stream -m whisper.cpp/models/ggml-tiny.en.bin -c 0 and -c 1 and both showed the following:

init: found 2 capture devices:
init:    - Capture device #0: 'Loopback, Loopback PCM'
init:    - Capture device #1: 'Loopback, Loopback PCM (2)'
init: attempt to open capture device 0 : 'Loopback, Loopback PCM' ...                          <-- changes with -c 1
init: obtained spec for input device (SDL Id = 2):                                             <-- doesn't change with -c 1
init:     - sample rate:       16000
init:     - format:            33056 (required: 33056)
init:     - channels:          1 (required: 1)
init:     - samples per frame: 1024
whisper_init_from_file_with_params_no_state: loading model from 'whisper.cpp/models/ggml-tiny.en.bin'
[whisper model info...]

main: processing 48000 samples (step = 3.0 sec / len = 10.0 sec / keep = 0.2 sec), 4 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 2, no_context = 1

[Start speaking]

But then no transcription, even using the sample jfk.wav.

0 replies

hammeronthenet · 2024-01-12T11:22:41Z

hammeronthenet
Jan 12, 2024

Hi! Great job, really!

One question: is there any example/tutorials/guide/whatever on how to implements the same thing using whisper.cpp inside a python script?
Just for experimenting something more sophisticated like using the result as input to an LLM model and so on.

Thank you.

0 replies

wyrmwood9 · 2024-02-22T06:07:33Z

wyrmwood9
Feb 22, 2024

I'm getting an error when compiling using make -j stream on a Raspberry Pi 5 running Pi OS Bookworm 12.2.0 (uname -a
Linux raspberrypi 6.1.0-rpi8-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux)

following the build instructions I get an error from make which results in no ./stream folder

ggml-quants.c: In function 'ggml_vec_dot_iq1_s_q8_K':
ggml-quants.c:9349:23: error: incompatible types when assigning to type 'int8x16x4_t' from type 'ggml_int8x16x4_t'
 9349 |                 q8b = ggml_vld1q_s8_x4(q8); q8 += 64;
      |                       ^~~~~~~~~~~~~~~~
make: *** [Makefile:326: ggml-quants.o] Error 1

any ideas?

thanks for any help!

2 replies

ggerganov Feb 22, 2024
Maintainer Author

Will try to fix this today, but unfortunately my RPi4 stopped working, so I don't have hardware to test on

wyrmwood9 Mar 8, 2024

Yay! now working. 🥇

xyh666168 · 2024-03-20T05:38:49Z

xyh666168
Mar 20, 2024

I'm getting an error when compiling using make -j stream on a Raspberry Pi 5 running Pi OS Bookworm 12.2.0 (uname -a Linux raspberrypi 6.1.0-rpi8-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux)

following the build instructions I get an error from make which results in no ./stream folder
ggml-quants.c: In function 'ggml_vec_dot_iq1_s_q8_K':
ggml-quants.c:9349:23: error: incompatible types when assigning to type 'int8x16x4_t' from type 'ggml_int8x16x4_t'
 9349 |                 q8b = ggml_vld1q_s8_x4(q8); q8 += 64;
      |                       ^~~~~~~~~~~~~~~~
make: *** [Makefile:326: ggml-quants.o] Error 1
any ideas?

thanks for any help!

can you show how to fix this build errer ?thanks!

0 replies

dirkarnez · 2024-04-05T17:33:31Z

dirkarnez
Apr 5, 2024

Hi @ggerganov i was trying to run the stream with exactly same parameters except i use non-English Cantonese model, it seems whispercpp cannot make any processing for realtime, but it works very good on PC non-realtime (with wav-files). I also set the language argument to auto and / or yue, still not make any processing, do you have the same effect on your RPI? Thank you so much!

0 replies

Real-time transcription on Raspberry Pi 4 #166

ggerganov Nov 21, 2022 Maintainer

Real-time with 4 seconds step

Real-time with 7.5 seconds step

Build instructions

More information

Replies: 17 comments · 24 replies

bobqianic Sep 5, 2023 Collaborator

bobqianic Dec 24, 2023 Collaborator

ggerganov Dec 13, 2023 Maintainer Author

ggerganov Dec 14, 2023 Maintainer Author

ggerganov Feb 22, 2024 Maintainer Author

ggerganov
Nov 21, 2022
Maintainer

Replies: 17 comments 24 replies

bobqianic
Sep 5, 2023
Collaborator

bobqianic Dec 24, 2023
Collaborator

ggerganov Dec 13, 2023
Maintainer Author

ggerganov Dec 14, 2023
Maintainer Author

ggerganov Feb 22, 2024
Maintainer Author