Investigate a better speech-to-text module #23

gcampax · 2019-04-18T04:37:41Z

Bing Speech is good but not exceptional quality, and the streaming API is going down, leaving only the REST API which has no feedback while the user speaks and is also quite slow.

We should investigate alternative software, such as Mozilla's deepspeech. If necessary, we can run our own servers, which is probably a good idea for privacy anyway.

12people · 2022-09-25T20:43:25Z

Open AI just released its high-quality speech recognition model under the MIT license: https://github.com/openai/whisper .

It offers several model sizes with different system requirements, but all potentially usable offline. The ideal solution might be for Almond to ship with a smaller model (or feature a setting allowing users to choose to download an offline model) and run a server with a larger model.

gcampax added enhancement help wanted labels Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate a better speech-to-text module #23

Investigate a better speech-to-text module #23

gcampax commented Apr 18, 2019

12people commented Sep 25, 2022

Investigate a better speech-to-text module #23

Investigate a better speech-to-text module #23

Comments

gcampax commented Apr 18, 2019

12people commented Sep 25, 2022