Skip to content

ksquarekumar/whisper-stream

Repository files navigation

CodeQL

Whisper-Stream 🌬️

This project aims to provide applications/pipelines and common code for performing fast automatic speech recognition, transcription and translation using open-source models based on open-ai's whisper project.

Table of Contents

Installation

Consuming this project via pip

pip install "whisper-stream[{feature},...] @ git+https://github.com/ksquarekumar/whisper-stream.git@main"

Development

This project uses pyenv, mamba and poetry to manage environments, dependencies and building wheels.

For correct building of artifacts, this proejct also relies on some poetry plugins:

poetry-multiproject-plugin

poetry-conda

For available extras/features refer to the extras section under [tool.poetry.extras] project manifest

Step by Step installation.

1. Clone this repo.

git clone git+https://github.com/ksquarekumar/whisper-stream.git
1.1. Install pyenv.
curl https://pyenv.run | bash

2. Install a mambaforge environment with pyenv.

pyenv install mambaforge-22.9.0-3 && pyenv shell mambaforge-22.9.0-3 && mamba activate base
mamba install poetry
mamba update --name base --update-all
exec $(SHELL)
poetry self add poetry-conda poetry-multiproject-plugin
poetry self update
2.2. Optionally, set it (base) as the default global interpreter in pyenv.
pyenv global mambaforge-22.9.0-3
exec $(SHELL)

3. Create a project environment (named: whisper_py311) from the existing conda.yml manifest.

mamba env create -f conda.yml && mamba activate whisper_py311

4. Initialize poetry with the correct python and install project dependencies in a project local virtual environment with poetry.

mamba activate whisper_py311
poetry env use "$(which python)"
poetry install -E "[list of features,..]"
  • For development installs you probably want all of "[dev,test]" groups so poetry install is what you need

  • For non-development install you probably want to exclude [dev,test] groups, so install with:

poetry install --only main

5. Optional, setup project local commit and git hooks

pre-commit install --install-hooks

TL;DR Version for CI Builds

assumes source is present in system

  • within the system python for containers
pip install projects/{feature_set}/requirements.txt
pip install .["{feature_set_extras}",..]
  • with conda as the system's environment manager
conda install mamba
mamba env update -f conda.yml
pip install projects/{feature_set}/requirements.txt
pip install .["{feature_set_extras}",..]

Feature-Sets

License

some jax modules are partially vendored from whisper-jax

whisper-stream is distributed under the terms of the Apache-2.0 license.