whisper2me

If you are annoyed by voice messages, whisper2me is the bot for you. Just start the bot on your machine and forward the messages you want to transcribe.

Prerequisites

The simplest way to use whisper2me is to use Docker. You can install docker in your machine by following the official Docker documentation.

Additionally you need the following:

git, which can be installed with:
```
sudo apt install git
```
The bot token which can be obtained with the BotFather on Telegram following the guide here
Your Telegram user_id

Note

The code has been tested only on Ubuntu, and there is no guarantee that will work on different OS's. For CUDA it has been tested on a Nvidia Orin AGX. If you plan to use this container on Windows, you can use WSL, see installation steps here

Setup

Clone the repository on your machine with:

git clone https://github.com/Armaggheddon/whisper2me.git

Go inside the downloaded folder:
```
cd whisper2me
```

Inside the Dockerfile you have to edit the following lines:

ENV BOT_TOKEN=YOUR_BOT_TOKEN
ENV ADMIN_USER_ID=YOUR_ADMIN_ID

where YOUR_BOT_TOKEN and ADMIN_USER_ID are written as is, for example:

ENV BOT_TOKEN=0000000000:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ENV ADMIN_USER_ID=000000000

By default the bot will use the smallest model, i.e. TINY. However, if the device on which you are running the bot has more capabilities you may want to try bigger models. The available models are the ones provided by OpenAI and are (at the time of writing):
- TINY
- TINY_EN
- BASE
- BASE_EN
- SMALL
- SMALL_EN
- MEDIUM
- MEDIUM_EN
- LARGE_V1
- LARGE_V2
- LARGE_V3
- LARGE
To try different models, replace TINY with one of the above options in the Dockerfile:
```
# Available values are, defaults to TINY if mispelled:
# >TINY             >TINY_EN
# >BASE             >BASE_EN
# >SMALL            >SMALL_EN
# >MEDIUM           >MEDIUM_EN
# >LARGE_V1         >LARGE_V2
# >LARGE_V3         >LARGE
ENV MODEL_NAME=TINY
```
Refer to the OpenAI whisper's official paper for the performance evaluation between the different models, available here
Build the docker image with:
```
docker build -t whisper2me .
```
After the image has been built you can see it with:
```
docker images list
```
And check for whisper2me:latest
The bot allows the admin user to add and remove users without having to re-run the bot. To allow for this behaviour and have persistent data the bot uses 2 files, namely allowed_users.txt and allowed_users.bak. These are required to be mounted inside the container so that any modification is also available in the host.
Run the container with:
```
docker run -it --rm -v "$(pwd)"/persistent_data:/whisper2me/persistent_data -d whisper2me:latest
```
-d runs the container in detached mode.

To start the container automatically see Docker's --restart policies here
Replace --rm with --restart <YOUR_POLICY>, i.e. --restart unless-stopped

Tip

It is possible to override the options in the Dockerfile when using the run command by providing the same environment variables with --env and using the same key-name combination:
i.e., to use the medium model add --env MODEL_NAME=MEDIUM

When the container starts the model is downloaded. Depending on your internet connection and the selected model, this might take a while. The model's weights are stored in persistent_data/model_cache.

CUDA Setup

If using on Jetson platform, docker is already installed in Jetpack, use NVIDIA L4T PyTorch image. If using on DGPU, nvidia-docker requires to be installed, you can follow the Nvidia's guide here and use the PyTorch image.

Note

The following steps have been tested on a Nvidia Orin AGX running Jetpack 5.1.2 with the NVIDIA L4T PyTorch r35.2.1-pth2.0-py3 image. If trying to use on a DGPU the steps might be different

Follow steps 1 and 2 of Setup
Run the container and mount the current directory with:
```
docker run -it --rm --runtime nvidia --gpus all -v "$(pwd)":/whisper2me nvcr.io/nvidia/pytorch:xx.xx-py3
```
replace pythorch:xx.xx-py3 with the version you downloaded
Once inside the container install ffmpeg with:
```
apt update && apt install ffmpeg -y
```
Install the python requirements with:
```
cd /whisper2me
pip install -r requirements_cuda.txt
```
If you get an error stating

ERROR: numba 0.58.1 has requirement numpy<1.27,>=1.22, but you'll have numpy 1.17.4 which is incompatible.

run the following command:
```
pip install -U numpy
```
and then re-run the above command
When the installation has finished press CTRL+P + CTRL+Q to detach from the running container
Get the container ID with:
```
docker container list
```
and copy the CONTAINER ID of the PyTorch container
Commit the changes to the container and save it with a new name with:
```
docker commit -p CONTAINER_ID whisper2me:latest
```
The changes to the base image are stored in the new image that will be named whisper2me:latest

-p option pauses the container while the commit is being executed.
Check the new image with:
```
docker image list
```
Required arguments:
- Set the BOT_TOKEN and ADMIN_USER_ID with:
```
--env BOT_TOKEN=YOUR_BOT_TOKEN --env ADMIN_USER_ID=YOUR_ADMIN_ID
```
  replacing YOUR_BOT_TOKEN and YOUR_ADMIN_ID with yours
Optional arguments:
- To use CUDA, defaults to False if not used or if the GPU is not detected from torch:
```
--env USE_CUDA=True
```
- Use fp16 instead of fp32, will be used only if CUDA is True and is detected
```
--env USE_FP16=True
```
- Select the GPU that will be used for the model inference, defaults to 0:
```
--env DEVICE_ID=0
```
- Change the model used, defaults to TINY:
```
# >TINY             >TINY_EN
# >BASE             >BASE_EN
# >SMALL            >SMALL_EN
# >MEDIUM           >MEDIUM_EN
# >LARGE_V1         >LARGE_V2
# >LARGE_V3         >LARGE
--env MODEL_NAME=TINY
```

Now you can run the bot using the GPU with:

docker run -it --rm --runtime nvidia --gpus all --env BOT_TOKEN=YOUR_BOT_TOKEN --env ADMIN_USER_ID=YOUR_USER_ID --env USE_CUDA=True -v "$(pwd)":/whisper2me -d whisper2me:latest bash -c "cd /whisper2me && python3 src/main.py"

If, for example, you want to use GPU:3, with the large-v3 model in fp16:

docker run -it --rm --runtime nvidia --gpus all --env BOT_TOKEN=YOUR_BOT_TOKEN --env ADMIN_USER_ID=YOUR_USER_ID --env MODEL_NAME=LARGE_V3 --env USE_CUDA=True --env DEVICE_ID=3 --env USE_FP16=True -v "$(pwd)":/whisper2me -d whisper2me:latest bash -c "cd /whisper2me && python3 src/main.py"

When the container starts the model is downloaded. Depending on your internet connection and the selected model, this might take a while. The startup time, compared to CPU is significantly longer, on my tests the bot can take up to 1 minute before being ready.

Usage

Once the bot is up and running, simply open the bot's chat and click the Start command.

To use the bot simply forward or send an audio message. You will receive a message confirmation and when the transcription/translation is ready a new message with the content.

Additionally, when a NON-ADMIN user tries a command reserved to the ADMIN, the ADMIN is notified with a message containing the user_id and the command that the user sent.

Available commands

The available list of commands depends on the case the user is an admin or not:

Commands available to all users:
- /start begins the conversation with the bot
- /info shows the current bot settings
- /help shows a list of available commands
Commands available only to the ADMIN:
- /language change the model target language, currently are listed only:
  - 🇺🇸 English
  - 🇫🇷 French
  - 🇩🇪 German
  - 🇮🇹 Italian
  - 🇪🇸 Spanish
- /task change the model task to:
  - ✍ Transcribe, the input voice message is trasncribed using the automatically detected language
  - 🗣 Translate, the input voice message is translated using the selected language with the /language command
- /users lists the users that are currently allowed to use the bot
- /add_user starts the interaction to add allow a new user. You can either send:
  - The user_id of the user you want to add
  - Forward a text message of the desired user so that the user_id is automatically retrieved, much simpler
- /remove_user starts the interaction to remove a user. A list of currently allowed users is display, simply click the one you want to remove
- /purge removes all users from the allowed list. Requires a confirmation message that spells exactly YES

How it works

whisper2me uses the following libraries:

OpenAI's whisper model to perform the trancription/translation tasks.
pyTelegramBotAPI for the telegram bot functionality

Note

Translation is only available when a using a model that does not end with _EN

The code can run on both ARM-64 and X64 architectures. It has been tested on:

Raspberry Pi 3B with 1GB of RAM (using Raspberry Pi OS(64-bit) Lite), the only runnable model is the TINY one. Almost all available Pi's resources are used and runs approximately 6x slower than real-time.
Nvidia Orin AGX with 64GB of RAM (using Jetpack 5.1.2), all models run without any issue. Using the LARGE_V3 model requires around 25-30 GB of combined RAM (both CPU and GPU). Execution time is faster than real-time.

Task list

/purge command not removing all users Fixed 😁

Add model cache to avoid redownload of the model every time the container is ran. Fixed 😁

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
doc/images		doc/images
persistent_data		persistent_data
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements_cuda.txt		requirements_cuda.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc/images

doc/images

persistent_data

persistent_data

src

src

.dockerignore

.dockerignore

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

requirements_cuda.txt

requirements_cuda.txt

Repository files navigation

whisper2me

Table of Contents

Prerequisites

Setup

CUDA Setup

Usage

Available commands

How it works

Task list

About

Releases

Packages

Languages

License

Armaggheddon/whisper2me

Folders and files

Latest commit

History

Repository files navigation

whisper2me

Table of Contents

Prerequisites

Setup

CUDA Setup

Usage

Available commands

How it works

Task list

About

Topics

Resources

License

Stars

Watchers

Forks

Languages