Finetune Whisper using LoRA for Cantonese and Mandarin

Get Started

1. Setup Docker Environment

Switch to the docker folder and build Docker GPU image for training:

cd docker
docker compose build

Onece the building process complete, run the following command to start a Docker container and attach to it:

docker compose up -d
docker exec -it asr bash

2. Prepare Training Data

See detail in dataset_scripts folder.

3. Finetune Pretrained Model

# Finetuning
python finetune.py --model_id base --streaming True --train_batch_size 64 --gradient_accumulation_steps 2 --fp16 True

# LoRA Finetuning
python finetune_lora.py --model_id large-v2 --streaming True --train_batch_size 64 --gradient_accumulation_steps 2

4. Evaluate Performance

# Evaluation
python eval.py --model_name_or_path Oblivion208/whisper-tiny-cantonese --streaming True --batch_size 64

# LoRA Evaluation
python eval_lora.py --peft_model_id Oblivion208/whisper-large-v2-lora-mix --streaming True --batch_size 64

Note: Setting --streaming to False will cache acoustic features on local disk, which speeds up finetuning processes, but it increases the disk usage dramatically (almost three times of raw audio files size).

Approximate Performance Evaluation

The following models are all trained and evaluated on a single RTX 3090 GPU via Vast.ai.

Cantonese Test Results Comparison

MDCC

Model name	Parameters	Finetune Steps	Time Spend	Training Loss	Validation Loss	CER %	Finetuned Model
whisper-tiny-cantonese	39 M	3200	4h 34m	0.0485	0.771	11.10	Link
whisper-base-cantonese	74 M	7200	13h 32m	0.0186	0.477	7.66	Link
whisper-small-cantonese	244 M	3600	6h 38m	0.0266	0.137	6.16	Link
whisper-small-lora-cantonese	3.5 M	8000	21h 27m	0.0687	0.382	7.40	Link
whisper-large-v2-lora-cantonese	15 M	10000	33h 40m	0.0046	0.277	3.77	Link

Common Voice Corpus 11.0

Model name	Original CER %	w/o Finetune CER %	Jointly Finetune CER %
whisper-tiny-cantonese	124.03	66.85	35.87
whisper-base-cantonese	78.24	61.42	16.73
whisper-small-cantonese	52.83	31.23	/
whisper-small-lora-cantonese	37.53	19.38	14.73
whisper-large-v2-lora-cantonese	37.53	19.38	9.63

Requirements

Transformers
Accelerate
Datasets
PEFT
bitsandbytes
librosa

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
convert_model		convert_model
dataset_scripts		dataset_scripts
docker-cpu		docker-cpu
docker		docker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_dataset.py		download_dataset.py
download_model.py		download_model.py
eval.py		eval.py
eval_lora.py		eval_lora.py
finetune.py		finetune.py
finetune_lora.py		finetune_lora.py
load_datasets.py		load_datasets.py
upload_model.py		upload_model.py

License

fengredrum/finetune-whisper-lora

Folders and files

Latest commit

History

Repository files navigation

Finetune Whisper using LoRA for Cantonese and Mandarin

Get Started

1. Setup Docker Environment

2. Prepare Training Data

3. Finetune Pretrained Model

4. Evaluate Performance

Approximate Performance Evaluation

Cantonese Test Results Comparison

MDCC

Common Voice Corpus 11.0

Requirements

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages