Skip to content

Create transcript for audio/video file using GCP cloud speech api

Notifications You must be signed in to change notification settings

rajulonline/speech_to_text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Transcript a audio/video file using google cloud speech api

  • Create a gcp account.
  • Enable cloud speech api.
  • Remember to save your API_KEY and export it to your bin path using export API_KEY=XXXXXX.
  • Create a bucket in storage and upload your mono stereo .flac files.
  • Enable it to be accessible through public link. (Not ideal but its ok for learning purposes.)


  • Create Transcript for the audio/video file

    • If your file format is .mp4, convert it to .flac format with mono stereo.
    • You can convert the non mono stereo file to mono stereo using ffmpeg. Download and install it first.
    • Use the following command to convert. ffmpeg -i location_of_your_file/audio_file.flac -ac 1 mono.flac
    • Open the .flac file using quicktime player and get its sample rate
    • create request.json file

      { "config": { "encoding":"FLAC", "sampleRateHertz": 44100, "language_code": "en-US" }, "audio": { "uri":"gs://your_app_api/mono.flac" } }

    • If the audio/video content of the file is greater than 1 min use speech:longrunningrecognize endpoint, otherwise use speech:recognize. curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json "https://speech.googleapis.com/v1/speech:longrunningrecognize?key=${API_KEY}"
    • https://speech.googleapis.com/v1/operations/OPERATION_NAME?key=${API_KEY} \ | jq -r '.response.results[].alternatives[].transcript' > output.txt

About

Create transcript for audio/video file using GCP cloud speech api

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published