MooseNet PLDA

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module, on Arxiv.
Accepted to Speech Synthesis Workshop 12, 2023, Grenoble
Presentation slides

Moosenet PLDA

MooseNet is a trainable metric for synthesized speech. We experimented with SSL NN models and PLDA module. See the MooseNet-PLDA paper.

Installation

# Optional for reinstallation
conda deactivate; rm -rf env; 
# Installing new conda environment and editable pip moosenet package
conda env create --prefix ./env -f environment.yml \
  && conda activate ./env \
  && pip install -e .[dev]

Reproducing the Experiments

The commands for fine-tuning a SSL models (XLS-R and Wav2Vec 2.0) to MooseNet NN on the English data from the main track can be found in ./main.sh
For the commands for fine-tuning MooseNet NN on main and the Chinese set from OOD track see ./ood.sh

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
moosenet		moosenet
test		test
utils		utils
.editorconfig		.editorconfig
.gitignore		.gitignore
LDNet		LDNet
LICENSE		LICENSE
README.md		README.md
VoiceMOS_baseline_README.md		VoiceMOS_baseline_README.md
environment.yml		environment.yml
infer.py		infer.py
main.sh		main.sh
musan		musan
ood.sh		ood.sh
pack_for_voicemos.sh		pack_for_voicemos.sh
setup.py		setup.py
train.py		train.py
voice_mos_dat_prep_environment.yml		voice_mos_dat_prep_environment.yml

License

oplatek/moosenet-plda

Folders and files

Latest commit

History

Repository files navigation

MooseNet PLDA

Moosenet PLDA

Installation

Reproducing the Experiments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages