Skip to content

The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back. Like Apple Siri, Amazon Alex, Google Nest Home, Mi XiaoAi etc.

License

Notifications You must be signed in to change notification settings

jackwuwei/gptspeaker

Repository files navigation

ChatGPT Voice Assistant

中文

  • The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back. Like Apple Siri, Amazon Alex, Google Nest Home, Mi XiaoAi etc.
  • This project is written in python which supports Linux/Raspbian, macOS, and Windows.

Features

  • Supports real-time voice dialogue. After ChatGPT returns a sentence, you can hear the voice instead of waiting for all ChatGPT replies before starting the voice synthesis.
  • Support continuous dialogue, save the history of all ChatGPT current conversations. When the ChatGPT conversation is larger than 4096 tokens (gpt-3.5-turbo), the early conversation history will be discarded.
  • Support local wake word, use it just like Siri.

Voice Assistant Speaker

GPT Speaker

  • Hardware
  • Software
    • Azure Cognitive Speech Services
      • Free tier: 5 audio hours per month and 1 concurrent request.
      • Free $200 credit: With a new Azure account that can be used during the first 30 days.
    • OpenAI
      • $0.002 / 1K tokens / ~750 words: ChatGPT (gpt-3.5-turbo)
      • Free $18 credit: With a new OpenAI account that can be used during your first 90 days.

Setup

  • You will need an instance of Azure Cognitive Services and an OpenAI account. You can run the software on nearly any platform, but let's start with a Raspberry Pi.

Raspberry Pi

1. OS

  1. Insert an SD card into your PC.
  2. Go to https://www.raspberrypi.com/software/ then download and run the Raspberry Pi Imager.
  3. Click Choose OS and select the Raspberry Pi OS (64-bit) or Ubuntu 22.04.2 LTS (64-bit) .
  4. Click Choose Storage, select the SD card.
  5. Click Write and wait for the imaging to complete.
  6. Put the SD card into your Raspberry Pi and connect a keyboard, mouse, and monitor.
  7. Complete the initial setup, making sure to configure Wi-Fi.

2. USB Speaker/Microphone

  1. Plug in the USB speaker/microphone if you have not already.
  2. On the Raspberry PI OS desktop, right-click on the volume icon in the top-right of the screen and make sure the USB device is selected.
  3. Right-click on the microphone icon in the top-right of the screen and make sure the USB device is selected.

Azure

The conversational speaker uses Azure Cognitive Service for speech-to-text and text-to-speech. Below are the steps to create an Azure account and an instance of Azure Cognitive Services.

1. Azure Account

  1. In a web browser, navigate to https://aka.ms/friendbot/azure and click on Try Azure for Free.
  2. Click on Start Free to start creating a free Azure account.
  3. Sign in with your Microsoft or GitHub account.
  4. After signing in, you will be prompted to enter some information.

    NOTE: Even though this is a free account, Azure still requires credit card information. You will not be charged unless you change settings later.

  5. After your account setup is complete, navigate to https://aka.ms/friendbot/azureportal.

2. Azure Cognitive Services

  1. Sign into your account at https://aka.ms/friendbot/azureportal.
  2. In the search bar at the top, enter Cognitive Services. Under Marketplace select Cognitive Services. (It may take a few seconds to populate.)
  3. Verify the correct subscription is selected. Under Resource Group select Create New. Enter a resource group name (e.g. conv-speak-rg).
  4. Select a region and a name for your instance of Azure Cognitive Services (e.g. my-conv-speak-cog-001).

    NOTE: EastUS, WestEurope, or SoutheastAsia are recommended, as those regions tend to support the greatest number of features.

  5. Click on Review + Create. After validation passes, click Create.
  6. When deployment has completed you can click Go to resource to view your Azure Cognitive Services resource.
  7. On the left side navigation bar, under Resourse Management, select Keys and Endpoint.
  8. Copy either of the two Cognitive Services keys. Save this key in a secure location for later.

Windows 11 users: If the application is stalling when calling the text-to-speech API, make sure you have applied all current security updates (link).

OpenAI

The conversational speaker uses OpenAI's models to hold a friendly conversation. Below are the steps to create a new account and access the AI models. Supports OpenAI official API or Azure OpenAI API, just choose one.

1. OpenAI Account

  1. In a web browser, navigate to https://aka.ms/maker/openai. Click Sign up.

    NOTE: can use a Google account, Microsoft account, or email to create a new account.

  2. Complete the sign-up process (e.g., create a password, verify your email, etc.).

    NOTE: If you are new to OpenAI, please review the usage guidelines (https://beta.openai.com/docs/usage-guidelines).

  3. In the top-right corner click on your account. Click on View API keys.
  4. Click + Create new secret key. Copy the generated key and save it in a secure location for later.

If you are curious to play with the large language models directly, check out the https://platform.openai.com/playground?mode=chat at the top of the page after logging in to https://aka.ms/maker/openai.

2. Azure OpenAI Account

Choose between OpenAI official account or Azure OpenAI account

  1. Create an Azure Account
    • If you don't have an Azure account, go to the Azure official website to sign up for an account. Azure offers a free account option, and new users can get a certain amount of free credits for testing and learning.
  2. Apply for Access
    • On the Azure OpenAI service page, click the "Apply for Access" button. This will take you to the application page where you need to fill in some necessary information, including your company name, use case, etc.
  3. Configure and Use
    • Once you have access, you can create a new OpenAI service resource in the Azure portal. After creation, you can get the API key and start using the Azure OpenAI service following the official documentation.

The Code

1. Code Configuration

  1. The Python Speech SDK package is available for Windows (x64 and x86), Mac x64 (macOS X version 10.14 or later), Mac arm64 (macOS version 11.0 or later), and Linux
  2. On the Raspberry Pi or your PC, open a command-line terminal.
  3. On Ubuntu or Debian, run the following commands for the installation of required packages:
    sudo apt-get update
    sudo apt-get install libssl-dev libasound2
  4. On Ubuntu 22.04 LTS it is also required to download and install the latest libssl1.1 package e.g. from http://security.ubuntu.com/ubuntu/pool/main/o/openssl/.
  5. Clone the repo.
    git clone https://github.com/jackwuwei/gptspeaker.git
  6. Set your API keys: Replace config.json {AzureCognitiveServices.Key}and {AzureCognitiveServices.Region} with your OpenAI API key and {OpenAI.Key} with your OpenAI API key.
    {
         "AzureCognitiveServices": {
            "Key": "AzureCognitiveServicesKey", 
            "Region": "AzureCognitiveServicesRegion",
        },
    
        "OpenAI": {
            "Key": "OpenAIKey", 
        },
    
        // Just choose one of the two OpenAI above
         "AzureOpenAI": 
         {
            "Key": "", // Key 1 or Key 2
            "api_version": "2024-02-01",
            "Endpoint": "", // Endpoint
            "Model": "" // Azure AI Studio deployment name 
      }
    }
  7. Install requirements
    pip3 -r install requirements.txt
  8. Run the code
    python3 gptspeaker.py

2. (Optional) Create a custom wake phrase

The code base has a default wake phrase ("Hey GPT") already, which I suggest you use first. If you want to create your own (free!) custom wake word, then follow the steps below.

  1. Create a custom keyword model using the directions here: https://aka.ms/hackster/microsoft/wakeword.
  2. Download the model, extract the .table file and copy it to source root directory.
  3. Update config.json file to include your wake phrase file in the build.
    "AzureCognitiveServices": {
       "WakePhraseModel": "xxx.table",
       "WakeWord": "xxx",
    }
  4. Rebuild and run the project to use your custom wake word.

About

The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back. Like Apple Siri, Amazon Alex, Google Nest Home, Mi XiaoAi etc.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages