Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS API improvements #2086

Closed
wants to merge 9 commits into from
Closed

TTS API improvements #2086

wants to merge 9 commits into from

Conversation

blob42
Copy link
Contributor

@blob42 blob42 commented Apr 20, 2024

Description

Improvements to the Coqui TTS API/backend.

  • tts coqui xtts_v2 not working without speaker_idx  #2073: Allow passing speaker_id to models
  • Add optional language parameter to TTS endpoint/schema
  • [ ] TTS Info endpoint: List available models, speakers and languages (will start new PR for this one)
  • update swagger documentation
  • define tts models with config files
  • updated docs

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

Copy link

netlify bot commented Apr 20, 2024

Deploy Preview for localai canceled.

Name Link
🔨 Latest commit b2361dc
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/6632c4cbace1270008899180

# List available 🐸TTS models
print(TTS().list_models())
print(TTS().list_models().list_models())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a leftover, or is it wanted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure, I will remove it then.

I am planning to include an endpoint to list models/speakers in this PR.

@mudler
Copy link
Owner

mudler commented Apr 20, 2024

I don't see how the changeset can fix #2073 - is there something missing in the PR?

@blob42
Copy link
Contributor Author

blob42 commented Apr 22, 2024

@mudler I didn't push those changes yet, I will remove the draft status when I will be done

@blob42
Copy link
Contributor Author

blob42 commented Apr 22, 2024

@mudler I am trying to understand where/when is the go gRPC server -> TTS service used, Is this a work in progress ?

@@ -66,7 +66,19 @@ def LoadModel(self, request, context):

def TTS(self, request, context):
try:
self.tts.tts_to_file(text=request.text, speaker_wav=self.AudioPath, language=COQUI_LANGUAGE, file_path=request.dst)
# if model is multilangual add language from request or env as fallback
lang = request.Lang or COQUI_LANGUAGE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I add a new Lang field in the protobuf definition ? It would be an optional one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the language is truly independent of both the model, voice, and input text, I see no reason not to have a Language parameter. Personally, I prefer to spell it out rather than name it Lang?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed better to have clearly defined parameter.

Does it make sense to keep the COQUI_LANGUAGE env var ? What use case does it serve ?

@blob42 blob42 force-pushed the tts_api branch 2 times, most recently from 55251d3 to 66e1cd4 Compare April 23, 2024 13:15
@blob42 blob42 marked this pull request as ready for review April 23, 2024 13:16
@blob42
Copy link
Contributor Author

blob42 commented Apr 23, 2024

I didn't push the swagger docs, it gave me alot of changes.

Quick way to test the language switching capability with multilingual models is something like this:

Without specifying lang:

The voice uses an English accent.

curl -L http://localai:8080/tts \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer 2708b7c21129e408899d5a38e6d1af8d " \
    -d '{
"backend": "coqui",
"input": "Bonjour Madame ! Comment allez-vous ?",
"model": "tts_models/multilingual/multi-dataset/xtts_v2",
"voice": "Ana Florence"
}' | aplay -D pipewire -

With lang:

Proper language accent is used

curl -L http://localai:8080/tts \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer 2708b7c21129e408899d5a38e6d1af8d " \
    -d '{
"backend": "coqui",
"input": "Bonjour Madame ! Comment allez-vous ?",
"model": "tts_models/multilingual/multi-dataset/xtts_v2",
"voice": "Ana Florence",
"lang": "fr"
}' | aplay -D pipewire -

@blob42 blob42 marked this pull request as draft April 26, 2024 00:12
@blob42 blob42 force-pushed the tts_api branch 5 times, most recently from 3ce3154 to 970de10 Compare April 26, 2024 01:59
@blob42 blob42 changed the title Coqui TTS API improvements TTS API improvements Apr 26, 2024
@blob42 blob42 marked this pull request as ready for review April 29, 2024 15:49
@blob42
Copy link
Contributor Author

blob42 commented Apr 29, 2024

Quick update regarding adding TTS Info endpoint. I am skipping this feature from this PR is it would involve too many changes that are out of scope for this PR.

Context:

The goal is to have the possibility to query available models/speakers or other type of information depending on the backend.

My first attempt was to add a gRPC service TTSInfoRequest to query the backend. I found out down the road that the backend grpc service is loaded with the model at the same time, however Info requests might not send any model infromation.

My proposal is to allow backends grpc service to be spawned without a model and to add a service called Info() or Query that backends can use to send arbitrary infromation. A model could be loaded later using the same spawned service or tear-down and start a new one for the designated model.

I will start a PR or Discussion for this proposal.

@blob42 blob42 force-pushed the tts_api branch 2 times, most recently from 1a2d0cb to fa6e144 Compare April 29, 2024 16:18
Signed-off-by: blob42 <contact@blob42.xyz>
Signed-off-by: blob42 <contact@blob42.xyz>
Signed-off-by: blob42 <contact@blob42.xyz>
Signed-off-by: blob42 <contact@blob42.xyz>
core/schema/localai.go Outdated Show resolved Hide resolved
core/backend/tts.go Outdated Show resolved Hide resolved
@mudler
Copy link
Owner

mudler commented Apr 29, 2024

overall looks good, thanks! just few nits/open questions above

blob42 added 5 commits May 2, 2024 00:40
Signed-off-by: blob42 <contact@blob42.xyz>
Signed-off-by: blob42 <contact@blob42.xyz>
- consolidate TTS options under `tts` config entry

Signed-off-by: blob42 <contact@blob42.xyz>
Signed-off-by: blob42 <contact@blob42.xyz>
Signed-off-by: blob42 <contact@blob42.xyz>
@blob42 blob42 closed this May 13, 2024
@blob42 blob42 deleted the tts_api branch May 13, 2024 07:15
@blob42 blob42 mentioned this pull request May 13, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants