Skip to content

This repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering

Notifications You must be signed in to change notification settings

alessandropec/data_driven_ai_voice_cloning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data driven AI voice cloning

This repository is an implementation of the main part of my master thesis in Data science & Engineering. It is divided in two part:

  1. Speaker Encoder
  1. models: ECAPA-TDNN, wavlm-series
  2. data: VoxCeleb1, private dataset
  1. Text-to-speech
  1. model: FastSpeech2 (microsoft implementation)
  2. data: LibriTTS

This two part are then integrated to achieve a Multi Speaker Text to Speech model that is capable of cloning unseen voices starting from about 5 seconds of audio, the ZeroShotFastSpeech2 model.