Support XTTS #38

jobobby04 · 2023-12-07T21:23:21Z

Using https://github.com/daswer123/xtts-api-server is one option to have XTTS support, I've looked at the code though and it relies on the local filesystem to share voice files between the client and server.
There is also this https://github.com/coqui-ai/xtts-streaming-server not sure how it would be used though.

Andryusz · 2023-12-08T15:14:04Z

XTTS is already supported using this simple OpenAI API wrapper server: https://github.com/semperai/basic-openai-api-wrapper.
You can find more info about the setup here: https://docs.heyamica.com/getting-started/installation#local-audio

This solution is based on the idea of converting whole sentences to WAV files before sending them back to Amica, so the main downside is that it may introduce some delay, especially for longer sentences. To lower the latency, Amica may further split sentences (at commas), but this results in a bit worse sentence audio cohesion (unnaturally long pauses at commas, each part of the sentence in a bit different tone). There is also this bug in the XTTS api where it writes an incorrect sampling rate in the WAV header, so the played voice is slower and sounds lower than it should.

Because of that, I'm currently working on a dedicated streaming server (live conversion and sending samples) for Amica using XTTS. I already have a working solution with low latency (independent of sentence length), proper lip sync and text progression. The code is still very hacky, with all configuration hardcoded, so I will probably need a week or two before sharing it for testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support XTTS #38

Support XTTS #38

jobobby04 commented Dec 7, 2023

Andryusz commented Dec 8, 2023

Support XTTS #38

Support XTTS #38

Comments

jobobby04 commented Dec 7, 2023

Andryusz commented Dec 8, 2023