Skip to content

Latest commit





SPT - Wav2Lip model

Adopting the Wav2Lip model, the addition of Spatial transformer network was added to increase the quality of lip synchornisation.

Demo Colab notebook

Setting up the dataset

Our model was trained using VoxCeleb2 dataset.

Preprocess the dataset for fast training

The dataset is recommended to be preprocessed first using our script.

python --data_root data_root/main --preprocessed_root lrs2_preprocessed/

To train the model, filelists/train.txt & filelists/val.txt is needed, where each line in the .txt file represents a folder containing frames and audio of the video.

Example. filelists/train.txt


Where Dataset folder structure

├── video1
│   ├── 0.jpg 
│   ├── 1.jpg 
│   ├── ...
│   ├── 120.jpg
│   └── audio.wav
├── video2
│   ├── 0.jpg 
│   ├── 1.jpg 
│   ├── ...
│   ├── 110.jpg
│   └── audio.wav

Training script

You can either train the model without the additional visual quality disriminator (< 1 day of training) or use the discriminator (~2 days). For the former, run:

python3 --data_root dataset/ --checkpoint_dir  <folder_to_save_checkpoints> --syncnet_checkpoint_path <path_to_expert_disc_checkpoint> --disc_checkpoint_path <path_to_perceptual_disc_checkpoint>

Running the model (inference)

python --checkpoint_path <ckpt> --face <video.mp4> --audio <an-audio-source> 

The result will be saved to results/result_voice.mp4


Evalutation README


To use deepfacelab, simply follow the instruction provided below.

StyleGAN emotion modifier

First to download the required dependencies, run these commands in the following order:

  • pip install tensorflow-gpu
  • pip install nvidia-tensorflow[horovod]
  • pip install tensorboard==2.10.0

This is due to tensorflow 1 now being depcrated, and for many environments, no longer avaiable or downloadable In addition, it is no longer aviable on PyPi.

Then, please download the following weights:

Then download the finetuned resnet weight from pbaylies listed below:

Into "stylegan/data"

To modifiy images, first encode the images into lantent space, provide the directory of the target photo(s)

Using the following code:

python image_encoder --src_dir [PATH_TO_DIR]

After the images are encoded, use "" to alter your image's emotion, the --coeff argument takes in numberical inputs, where positive numbers point towards happy, whilst negative numbers point towards sad.

The --face_name argument takes in the target photo name that you want the emotion to be altered for. Only pass in the image name with no extension or path.

For example, I want to alter an image I encoded called "demo1.jpg". I also want to change the face so a smile the command would be:

python --face_name demo1 --coeff 0.7

The generated result will be outtputed to "faces/generated_img"

Thanks to the following repository for their implementation: