You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since Nemo is a language-focused framework, I was wondering if it's on the roadmap or if there is a possibility to work with turn-taking models.
There is a lot of literature on this, and according to the current state of the art, I believe these kinds of models are indispensable for achieving realistic conversations. These models are based on predicting whose turn it is in the conversation, which allows for more realistic interactions. Specifically:
It enables the decision of when to give a response to the user, and this range is dynamic, whether the user is thinking or immediately after they have finished a sentence.
It can detect when the user intends to interrupt what the bot is saying.
It also offers the possibility of backchannels, such as saying phrases like "yeah" or "uh-huh" while the user is speaking, which has been proven to result in longer conversations by the user.
In our case, we also have a commercial version of Nvidia Riva. If we convert it to Nemo, could we then convert it to Riva? Could we deploy it on the Riva server, and would it be consumable with Riva clients?
Thank you and regards.
The text was updated successfully, but these errors were encountered:
Since Nemo is a language-focused framework, I was wondering if it's on the roadmap or if there is a possibility to work with turn-taking models.
There is a lot of literature on this, and according to the current state of the art, I believe these kinds of models are indispensable for achieving realistic conversations. These models are based on predicting whose turn it is in the conversation, which allows for more realistic interactions. Specifically:
My questions are as follows:
Thank you and regards.
The text was updated successfully, but these errors were encountered: