TalkingFaceGeneration-with-Emotion/TalkingFaceGeneration at main · UoA-CARES-Student/TalkingFaceGeneration-with-Emotion

History

Name		Name	Last commit message	Last commit date
parent directory ..
DeepFaceLab @ cd83f6f		DeepFaceLab @ cd83f6f
data/filelists		data/filelists
dnnlib		dnnlib
face_detection		face_detection
filelists		filelists
models		models
results		results
stylegan		stylegan
temp		temp
README.md		README.md
Wav2Lip-README.md		Wav2Lip-README.md
audio.py		audio.py
color_syncnet_train.py		color_syncnet_train.py
emotion_inference.py		emotion_inference.py
hparams.py		hparams.py
image_encoder.py		image_encoder.py
inference.py		inference.py
preprocess.py		preprocess.py
train.py		train.py

README.md

SPT - Wav2Lip model

Adopting the Wav2Lip model, the addition of Spatial transformer network was added to increase the quality of lip synchornisation.

Demo Colab notebook

Face detection pre-trained model should be downloaded to face_detection/detection/sfd/s3fd.pth.
Weights of the expert discriminator pre-trained ,model
Weights of the visual disc trained in a GAN setup pre-trained model
Spt-Wav2Lip pre-trained model

Setting up the dataset

Our model was trained using VoxCeleb2 dataset.

Preprocess the dataset for fast training

The dataset is recommended to be preprocessed first using our script.

python preprocess.py --data_root data_root/main --preprocessed_root lrs2_preprocessed/

To train the model, filelists/train.txt & filelists/val.txt is needed, where each line in the .txt file represents a folder containing frames and audio of the video.

Example. filelists/train.txt

dataset/video1
dataset/video2
...
dataset/video10

Where Dataset folder structure

dataset
├── video1
│   ├── 0.jpg 
│   ├── 1.jpg 
│   ├── ...
│   ├── 120.jpg
│   └── audio.wav
├── video2
│   ├── 0.jpg 
│   ├── 1.jpg 
│   ├── ...
│   ├── 110.jpg
│   └── audio.wav
...

Training script

You can either train the model without the additional visual quality disriminator (< 1 day of training) or use the discriminator (~2 days). For the former, run:

python3 train.py --data_root dataset/ --checkpoint_dir  <folder_to_save_checkpoints> --syncnet_checkpoint_path <path_to_expert_disc_checkpoint> --disc_checkpoint_path <path_to_perceptual_disc_checkpoint>

Running the model (inference)

python inference.py --checkpoint_path <ckpt> --face <video.mp4> --audio <an-audio-source>

The result will be saved to results/result_voice.mp4

Evalutation

Evalutation README

DeepFaceLab

To use deepfacelab, simply follow the instruction provided below.

Main guide (Specifically, I used scripts found in DeepFaceLab_Linux repository.)
Google Colab notebook

StyleGAN emotion modifier

First to download the required dependencies, run these commands in the following order:

pip install tensorflow-gpu
pip install nvidia-tensorflow[horovod]
pip install tensorboard==2.10.0

This is due to tensorflow 1 now being depcrated, and for many environments, no longer avaiable or downloadable https://stackoverflow.com/questions/73215696/did-colab-suspend-tensorflow-1-x In addition, it is no longer aviable on PyPi.

Then, please download the following weights:

https://drive.google.com/uc?id=1N2-m9qszOeVC9Tq77WxsLnuWwOedQiD2
https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ

Place these weights into "stylegan/weights"

Then download the finetuned resnet weight from pbaylies listed below:

https://drive.google.com/file/d/1EhaXKv2deh1l_R9mh1uebVyKkRvSXH3r/view

Into "stylegan/data"

To modifiy images, first encode the images into lantent space, provide the directory of the target photo(s)

Using the following code:

python image_encoder --src_dir [PATH_TO_DIR]

After the images are encoded, use "emotion_inference.py" to alter your image's emotion, the --coeff argument takes in numberical inputs, where positive numbers point towards happy, whilst negative numbers point towards sad.

The --face_name argument takes in the target photo name that you want the emotion to be altered for. Only pass in the image name with no extension or path.

For example, I want to alter an image I encoded called "demo1.jpg". I also want to change the face so a smile the command would be:

python emotion_inference.py --face_name demo1 --coeff 0.7

The generated result will be outtputed to "faces/generated_img"

Thanks to the following repository for their implementation:

https://github.com/pbaylies/stylegan-encoder

Files

TalkingFaceGeneration

Directory actions

More options