Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emotion voice cloning #1251

Open
song616818 opened this issue Sep 18, 2023 · 1 comment
Open

emotion voice cloning #1251

song616818 opened this issue Sep 18, 2023 · 1 comment

Comments

@song616818
Copy link

I use the method mentioned in thie rep https://github.com/innnky/emotional-vits to try to implement emotion voice cloning, I finetuned the pretrained synthsizer on a small dataset that contains about 24 speakers, each with 100 audio, and these 100 pieces of audio are divided by into roughly five or four categories, with the same tetx in each category but with different emotions. I inference with the finetuned synthesizer and the pretrained encoder and vocoder, but it's not working very well, if anyone know what the problem is or how it should be trained?

@MassEast
Copy link

I am not sure about the quality either. If I use the samples provided, I can generate reasonably good speech. If I use my own (e.g., by recording it through the UI), I was not able to produce any valuable output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants