PPG-GradVC

An any-to-many voice conversion model based on the architecture of Grad-TTS and PPG from a SSL-based phoneme recognizer.

Setup

Using python>=3.6, python<=3.9:

follow instructions under https://pytorch.org/get-started/locally/ to install Torch on your setup,
run

pip install -r requirements.txt

Prepare your multilingual corpora, then fill filelists (lines are formatted in <wavfile_path>|<speaker_num>|.)
A pretrained HiFi-GAN vocoder is located at ./hifigan/g_00875000, you can continue from it or train a new one.

Extract PPGs

python preprocess_ppg.py --sr 16000 --in_dir /your/dataset/flacs --out_dir /your/dataset/ppgs

python inference.py -f infer_for_test.txt -c ./logs/grad_300.pt -t 100

This is NOT my original research. The code was mostly copied from works done by Li Jingyi et al. at NERCMS, Wuhan Univ. I appreciate immensely the efforts of them.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
hifigan		hifigan
model		model
mos_samples		mos_samples
resources		resources
.gitignore		.gitignore
README.md		README.md
THIRD_PARTY_NOTICE		THIRD_PARTY_NOTICE
data_ppg.py		data_ppg.py
downsample.py		downsample.py
infer_for_test.txt		infer_for_test.txt
inference.py		inference.py
params.py		params.py
preprocess_ppg.py		preprocess_ppg.py
requirements.txt		requirements.txt
train_ppg.py		train_ppg.py
utils.py		utils.py