Fairseq voice cloning #3142

Poccapx · 2023-11-05T17:02:20Z

Describe the bug

There seems to be an issue of activating voice conversion in Coqui when using Fairseq models. Argument --speaker_wav works fine on identical text with the XTTS model, but with Fairseq it seems to be ignored. Have tried both .wav and .mp3, different lengths, file locations/names, with and without CUDA, several languages. There are no errors, just always the same generic male voice. Is this a known issue with voice cloning and Fairseq on Windows’ command line or is something wrong with my setup?

To Reproduce

No response

Expected behavior

No response

Logs

No response

Environment

Windows, tts.exe

Additional context

No response

The text was updated successfully, but these errors were encountered:

erogol · 2023-11-08T10:18:41Z

Can you give us a code for us to reproduce the problem?

Poccapx · 2023-11-08T10:28:50Z

Just running with any Fairseq model normally, the same way as with XTTS (which clones just fine, version 2 included): tts.exe --use_cuda true --model_name tts_models/[lang]/fairseq/vits --text "Testing voice cloning with Fairseq on Windows." --speaker_wav Test.wav --out_path Fairseq.wav

erogol · 2023-11-08T10:31:36Z

Can you try this? https://tts.readthedocs.io/en/latest/inference.html#example-voice-cloning-by-a-single-speaker-tts-model-combining-with-the-voice-conversion-model

TTS with VC is not supported on terminal AFAIR

Poccapx · 2023-11-08T10:36:25Z

The thing is that running tts.exe --use_cuda true --model_name tts_models/multilingual/multi-dataset/xtts_v2 --language_idx [lang] --text "Testing voice cloning with XTTS on Windows." --speaker_wav Test.wav --out_path XTTS.wav clones the voice perfectly fine. The problem is with Fairseq models, where the argument --speaker_wav seems to be ignored, using the generic male voice.

Sharrnah · 2023-11-10T19:04:43Z

@Poccapx
for non-voice cloning models, you need to run the resulting TTS audio through a voice conversion model. See
https://github.com/coqui-ai/TTS#voice-conversion-models

XTTS is a voice-cloning model which does this on its own. (And actually can't do without any cloning audio file).

Poccapx · 2023-11-10T19:37:01Z

That’s very important, thank you. Is there a list for models regarding --model_name "<language>/<dataset>/<model_name>"?

Sharrnah · 2023-11-10T19:45:13Z

pretty sure there is currently only one official one, and that is voice_conversion_models/multilingual/vctk/freevc24

(not sure if you have to keep the first part "voice_conversion_models" out from the --model_name argument, as i am not using the CLI)

you can find a list of all models here: https://github.com/coqui-ai/TTS/blob/dev/TTS/.models.json#L924

Poccapx · 2023-11-10T19:55:03Z

Right! In the string tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav> what is the difference between arguments --out_path and --target_wav?

Sharrnah · 2023-11-10T20:04:35Z

--source_wav is the speech audio you want to convert.
--target_wav is the speech you want the source_wav to be converted into
--out_path is the finished converted audio.

Poccapx · 2023-11-12T00:55:27Z

Using voice_conversion_models/multilingual/vctk/freevc24 on top of a Fairseq output has worked. The voice cloning quality is nowhere near that of XTTS, but at least that way it’s possible to switch from the default male voice to a female one. For Fairseq as a non-voice cloning model, is the --speaker_wav argument always pointless or are there instances where it is used with Fairseq? Since it is present in these two examples, and that got me thinking that something was wrong with my initial setup.

Sharrnah · 2023-11-14T12:22:25Z

sorry for the late reply.

for the first example link, its because the tts_with_vc_to_file() function does the voice conversion internally already. (thats what the "with_vc" part in the function name means)

About your second example, i have actually no idea. I would guess it has to do with the encoder model and not with the TTS model. But thats just a guess. So maybe i was wrong and you can somehow convert speakers using some vocoder models. Haven't found anything in the documentation about it, so maybe ask in the discussions https://github.com/coqui-ai/TTS/discussions about it.

I hate to make advertising, but in case you want, you can give my Application Whispering Tiger a try. It has multiple TTS plugins (including coqui TTS) and together with the RVC Plugin and a RVCv2 model, you can have probably the currently best voice conversion available. (its currently windows only though)

erogol · 2023-11-28T11:05:29Z

should be fixed by now.

Nanshanelectrician · 2023-12-15T10:22:07Z

你能试试这个吗？ https://tts.readthedocs.io/en/latest/inference.html#example-voice-cloning-by-a-single-speaker-tts-model-combining-with-the -语音转换模型

AFAIR 终端不支持带有 VC 的 TTS

UnboundLocalError: cannot access local variable 'dataset' where it is not associated with a value

Poccapx added the bug Something isn't working label Nov 5, 2023

erogol closed this as completed Nov 28, 2023

AntonyZ89 mentioned this issue Jan 7, 2024

fix: fairseq model #3500

Open

akgupta1337 mentioned this issue Jan 7, 2024

fix: fairseq unbound variable 'dataset' #3502

Closed

eginhard mentioned this issue Mar 8, 2024

Fix fairseq eginhard/coqui-tts#11

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fairseq voice cloning #3142

Fairseq voice cloning #3142

Poccapx commented Nov 5, 2023 •

edited

erogol commented Nov 8, 2023

Poccapx commented Nov 8, 2023

erogol commented Nov 8, 2023

Poccapx commented Nov 8, 2023 •

edited

Sharrnah commented Nov 10, 2023

Poccapx commented Nov 10, 2023

Sharrnah commented Nov 10, 2023

Poccapx commented Nov 10, 2023

Sharrnah commented Nov 10, 2023

Poccapx commented Nov 12, 2023 •

edited

Sharrnah commented Nov 14, 2023 •

edited

erogol commented Nov 28, 2023

Nanshanelectrician commented Dec 15, 2023

Fairseq voice cloning #3142

Fairseq voice cloning #3142

Comments

Poccapx commented Nov 5, 2023 • edited

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

erogol commented Nov 8, 2023

Poccapx commented Nov 8, 2023

erogol commented Nov 8, 2023

Poccapx commented Nov 8, 2023 • edited

Sharrnah commented Nov 10, 2023

Poccapx commented Nov 10, 2023

Sharrnah commented Nov 10, 2023

Poccapx commented Nov 10, 2023

Sharrnah commented Nov 10, 2023

Poccapx commented Nov 12, 2023 • edited

Sharrnah commented Nov 14, 2023 • edited

erogol commented Nov 28, 2023

Nanshanelectrician commented Dec 15, 2023

Poccapx commented Nov 5, 2023 •

edited

Poccapx commented Nov 8, 2023 •

edited

Poccapx commented Nov 12, 2023 •

edited

Sharrnah commented Nov 14, 2023 •

edited