Skip to content

GiannisPikoulis/dsml-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diffusion Models with Applications in Face Reenactment and Talking-Face Synthesis

Preparation

  • Clone the repo and its submodules:
git clone --recurse-submodules -j4 https://github.com/GiannisPikoulis/dsml-thesis
cd dsml-thesis
  • A suitable conda environment named ldm can be created and activated with:
conda env create -f environment.yaml
conda activate ldm
cd talking_face/external/av_hubert/fairseq/
pip install --editable ./

First Stage

In order to train first stage autonencoders, please follow the instructions of the Taming Transformers repository. We recommend using a VQGAN as a first stage model.

LDM Training

In both face-reenactment and talking-face generation scenarios, LDM training can be performed as follows:

CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/latent-diffusion/<config_spec>.yaml -t --gpus 0,

Citation

If you use our code or your research benefits from this repository, consider citing the following:

@misc{pikoulis2023photorealistic,
      title={Photorealistic and Identity-Preserving Image-Based Emotion Manipulation with Latent Diffusion Models}, 
      author={Ioannis Pikoulis and Panagiotis P. Filntisis and Petros Maragos},
      year={2023},
      eprint={2308.03183},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowlegements

Contact

For questions feel free to open an issue.