Skip to content

UBOS-tech/UBOS-template-Speech-To-Text

Repository files navigation

Ubos - End-to-End Software Development Platform

Use template

Speech To Text

The Speech to Text template is a powerful tool that leverages advanced speech recognition and natural language processing capabilities to generate accurate textual transcriptions from uploaded audio or video files. This template provides a seamless experience for users, enabling them to effortlessly extract meaningful information from audio-visual content.

Node-RED Flows

Знімок екрана 2024-03-20 о 15 50 59

This is a Node-RED flow that allows users to describe video/audio to text, and using your custom prompt make a conclusion of the received text.

  • HTTP Input Node (`/convertSpeech`): This node listens for incoming HTTP POST requests at the `/convertSpeech` endpoint. It expects the request to include an audio file or a YouTube video URL, along with an OpenAI API key.
  • Function Node (`check type`): This function node determines whether the user has provided an audio file or a YouTube video URL. If a URL is provided, it sets up the necessary parameters for downloading the audio from the video. If a file is provided, it prepares the payload for the OpenAI API request.
  • YouTube-YTDL Node: If a YouTube video URL is provided, this node downloads the audio from the video.
  • HTTP Request Node (to OpenAI): If an audio file is provided, this node sends a POST request to the OpenAI API (`https://api.openai.com/v1/audio/transcriptions`) with the audio file and the necessary headers, including the API key.
  • Function Node (`response`): This function node processes the response from the OpenAI API. If the response status code is 200 (successful), it extracts the transcribed text from the response payload and assigns it to `msg.payload`. If there's an error, it constructs an error message and assigns it to `msg.payload`.
  • HTTP Response Node: This node sends the final response back to the client, containing either the transcribed text or an error message.

Key Features

Audio/Video Upload

The template features a user-friendly interface that allows users to upload audio files in popular formats such as WAV, MP3, FLAC, or provide YouTube video URLs. The uploading process is straightforward and intuitive, ensuring a smooth user experience.

Text Transcription with Whisper

At the heart of this template lies the powerful Whisper AI model from OpenAI, specifically designed for speech recognition and transcription tasks. Whisper employs advanced machine learning techniques to accurately transcribe audio content into textual form, capturing the spoken words with high fidelity.

Multiple Language Support

To cater to diverse linguistic needs, the template offers support for multiple languages, allowing users to transcribe audio in various languages and dialects. The available language options are regularly updated to ensure wide coverage and accuracy.

API Integration

To leverage the Whisper AI model, users need to obtain an OpenAI API key. The template provides clear instructions and guidance on how to acquire and utilize the API key effectively, ensuring secure and seamless integration with the transcription service.

Fast Processing

Thanks to Whisper's efficient processing capabilities, users can expect quick turnaround times for transcribing audio files or videos. This feature ensures a smooth and responsive user experience, minimizing wait times and enabling users to access textual insights from audio-visual content promptly.

Accuracy and Reliability

Whisper is trained on vast datasets and continuously updated to maintain high accuracy and reliability in transcribing speech across various domains, accents, and noise conditions. Users can trust the quality of the outputted text, ensuring that the transcriptions faithfully capture the spoken content.

Customization Options

The template offers various customization options, allowing users to fine-tune the output according to their specific requirements. This includes adjusting parameters such as language settings, formatting options, and specific areas of focus, ensuring that the transcriptions align with the user's needs.

Benefits

The Speech-to-Text template empowers users with a robust and efficient solution for extracting valuable textual information from audio-visual content. Whether for accessibility purposes, content analysis, data mining, or simply capturing spoken words in written form, this template offers a comprehensive and user-friendly experience. By leveraging the advanced capabilities of the Whisper AI model, users can unlock the hidden potential of their audio-visual data and transform it into actionable and meaningful textual information.

FireShot Capture 017 - Speech To Text - speech2text-ui-65a79fff4632651100000001 ubos tech

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published