Adopting the Wav2Lip model, the addition of Spatial transformer network was added to increase the quality of lip synchornisation.
- Face detection pre-trained model should be downloaded to
face_detection/detection/sfd/s3fd.pth
. - Weights of the expert discriminator pre-trained ,model
- Weights of the visual disc trained in a GAN setup pre-trained model
- Spt-Wav2Lip pre-trained model
Our model was trained using VoxCeleb2 dataset.
The dataset is recommended to be preprocessed first using our script.
python preprocess.py --data_root data_root/main --preprocessed_root lrs2_preprocessed/
To train the model, filelists/train.txt
& filelists/val.txt
is needed, where each line in the .txt
file represents a folder containing frames and audio of the video.
Example. filelists/train.txt
dataset/video1
dataset/video2
...
dataset/video10
Where Dataset folder structure
dataset
├── video1
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── ...
│ ├── 120.jpg
│ └── audio.wav
├── video2
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── ...
│ ├── 110.jpg
│ └── audio.wav
...
You can either train the model without the additional visual quality disriminator (< 1 day of training) or use the discriminator (~2 days). For the former, run:
python3 train.py --data_root dataset/ --checkpoint_dir <folder_to_save_checkpoints> --syncnet_checkpoint_path <path_to_expert_disc_checkpoint> --disc_checkpoint_path <path_to_perceptual_disc_checkpoint>
python inference.py --checkpoint_path <ckpt> --face <video.mp4> --audio <an-audio-source>
The result will be saved to results/result_voice.mp4
To use deepfacelab, simply follow the instruction provided below.
- Main guide (Specifically, I used scripts found in DeepFaceLab_Linux repository.)
- Google Colab notebook
- pip install tensorflow-gpu
- pip install nvidia-tensorflow[horovod]
- pip install tensorboard==2.10.0
This is due to tensorflow 1 now being depcrated, and for many environments, no longer avaiable or downloadable https://stackoverflow.com/questions/73215696/did-colab-suspend-tensorflow-1-x In addition, it is no longer aviable on PyPi.
- https://drive.google.com/uc?id=1N2-m9qszOeVC9Tq77WxsLnuWwOedQiD2
- https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ
Place these weights into "stylegan/weights"
Into "stylegan/data"
To modifiy images, first encode the images into lantent space, provide the directory of the target photo(s)
Using the following code:
python image_encoder --src_dir [PATH_TO_DIR]
After the images are encoded, use "emotion_inference.py" to alter your image's emotion, the --coeff argument takes in numberical inputs, where positive numbers point towards happy, whilst negative numbers point towards sad.
The --face_name argument takes in the target photo name that you want the emotion to be altered for. Only pass in the image name with no extension or path.
For example, I want to alter an image I encoded called "demo1.jpg". I also want to change the face so a smile the command would be:
python emotion_inference.py --face_name demo1 --coeff 0.7
The generated result will be outtputed to "faces/generated_img"