Skip to content

Transcribe audio files with Azure Cognitive Services

Notifications You must be signed in to change notification settings

flumi3/speech-to-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech To Text

This simple python project lets you convert the audio of a file into searchable text by using cloud computing resources from Azure Cognitive Services.

Requirements

  • Python 3
  • Instance of Azure Speech Service
  • Recommended audio format:
    • type: WAV (required)
    • precision: 16-bit
    • sample rate: 8kHz or 16kHz
    • channel: mono

Getting started

Setup the Azure Speech service

  1. Create free Azure Subscripition
  2. Create free instance of Speech service (5 audio hours per month)

Prepare the audio

The default audio format for the recognition to work is WAV (16 kHz or 8 kHz, 16-bit, and mono PCM). You can convert your audio with this Online Audio Converter.

Setup the environment

  1. Create virutal environment for installing the dependencies

    python3 -m venv venv
  2. Activate virtual environment

    # Linux
    source venv/bin/activate
    
    # Windows
    .\venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt

Provide your configuration

  1. Get API key and region of your Speech service resource
  2. Enter API key and location into env_sample.txt
  3. Enter input path, output path and language of your audio file into env_sample.txt
  4. Rename the file to .env

Run the transcription

python3 transcription.py

About

Transcribe audio files with Azure Cognitive Services

Topics

Resources

Stars

Watchers

Forks

Languages