Integration of Turn-Taking Models into Nemo Framework for Enhanced Realistic Conversations #9150

rodrigoGA · 2024-05-09T01:09:16Z

Since Nemo is a language-focused framework, I was wondering if it's on the roadmap or if there is a possibility to work with turn-taking models.

There is a lot of literature on this, and according to the current state of the art, I believe these kinds of models are indispensable for achieving realistic conversations. These models are based on predicting whose turn it is in the conversation, which allows for more realistic interactions. Specifically:

It enables the decision of when to give a response to the user, and this range is dynamic, whether the user is thinking or immediately after they have finished a sentence.
It can detect when the user intends to interrupt what the bot is saying.
It also offers the possibility of backchannels, such as saying phrases like "yeah" or "uh-huh" while the user is speaking, which has been proven to result in longer conversations by the user.

My questions are as follows:

Is this on the roadmap?
There are several open-source models that implement this, for example, https://github.com/ErikEkstedt/VoiceActivityProjection. In this particular case, it is a simple PyTorch model.
- Can this model be converted to a Nemo model?
- In our case, we also have a commercial version of Nvidia Riva. If we convert it to Nemo, could we then convert it to Riva? Could we deploy it on the Riva server, and would it be consumable with Riva clients?

Thank you and regards.

rodrigoGA assigned okuchaiev May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration of Turn-Taking Models into Nemo Framework for Enhanced Realistic Conversations #9150

Integration of Turn-Taking Models into Nemo Framework for Enhanced Realistic Conversations #9150

rodrigoGA commented May 9, 2024

Integration of Turn-Taking Models into Nemo Framework for Enhanced Realistic Conversations #9150

Integration of Turn-Taking Models into Nemo Framework for Enhanced Realistic Conversations #9150

Comments

rodrigoGA commented May 9, 2024