Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is possible to add new speechT5 models #41

Open
virtualrobotix opened this issue Dec 10, 2023 · 2 comments
Open

How is possible to add new speechT5 models #41

virtualrobotix opened this issue Dec 10, 2023 · 2 comments

Comments

@virtualrobotix
Copy link

How is possible change the text to speech model ? Is possible to use other .bin like voxpopuli for Italian language or other trained by ourself ? I try to add the voxpopuli.bin file in the public directory but the app stop to work without info in debug .
The main difference is that original model are 2 kbyte the voxpopuli are 500 mbyte .

@kasumi-1
Copy link
Contributor

Hi!

speecht5 requires x-vector embeddings - there is a list of ones from cmu arctic here https://huggingface.co/datasets/Xenova/cmu-arctic-xvectors-extracted/tree/main

I haven't generated these before, but I think you can use https://huggingface.co/pyannote/embedding to create them.

@kustomzone
Copy link

@virtualrobotix Haven't looked into it yet, but generate_paths.js looks to be misconfigured. (at least on windows)
You can bypass it by editing the src\paths.ts and adding your model.bin path like so;

speechT5SpeakerEmbeddingsList = ['speecht5_speaker_embeddings/speecht5_tts/pytorch_model.bin'];

Use quotes = string. You'll actually see it loading on the right-hand side of the screen the first time it runs.
Likely it's an array so you can comma separate different voice models, and they'll each show up in the UI.
Just started testing and it's slow on my pc using cpu, but it works!

loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants