Python bindings (C-style API) #9

ArtyomZemlyak · 2022-10-01T07:21:18Z

Good day everyone!
I'm thinking about bindings for Python.

So far, I'm interested in 4 functionalities:

Encoder processing
Decoder processing
Transcription of audio (feed audio bytes, get text)
3+Times of all words (feed audio bytes, get text + times of each word). Of course, it’s too early to think about the times of words, since even for a python implementation they are still not well done.

Perhaps in the near future, I will try to take up this task. But I had no experience with python bindings. So, if there are craftsmen who can do it quickly (if it can be done quickly... 😃), that would be cool!

ArtyomZemlyak · 2022-10-01T08:10:27Z

Some work around:

Building

main: ggml.o main.o
	g++ -L ggml.o -c -fPIC main.cpp -o main.o
	g++ -L ggml.o -shared -Wl,-soname,main.so -o main.so main.o ggml.o
	g++ -pthread -o main ggml.o main.o
	./main -h

ggml.o: ggml.c ggml.h
	gcc -O3 -mavx -mavx2 -mfma -mf16c -c -fPIC ggml.c -o ggml.o
	gcc -shared -Wl,-soname,ggml.so -o ggml.so ggml.o

main.o: main.cpp ggml.h
	g++ -pthread -O3 -std=c++11 -c main.cpp

Run main

import ctypes
import pathlib


if __name__ == "__main__":
    # Load the shared library into ctypes
    libname = pathlib.Path().absolute() / "main.so"
    whisper = ctypes.CDLL(libname)

    whisper.main.restype = None
    whisper.main.argtypes = ctypes.c_int, ctypes.POINTER(ctypes.c_char_p)

    args = (ctypes.c_char_p * 9)(
        b"-nt",
        b"--language", b"ru",
        b"-t", b"8",
        b"-m", b"../models/ggml-model-tiny.bin",
        b"-f", b"../audio/cuker1.wav"
    )
    whisper.main(len(args), args)

And its works!

ArtyomZemlyak · 2022-10-01T08:20:53Z

But with specific functions it is already more difficult:

You need to load the model at the C++ level
Ability to access its encode decode methods
In this case, the whole process with the loaded model should go in parallel with the Python

It might be worth considering running python and c++ in different threads/processes and sharing information between them, when its needed.

ggerganov · 2022-10-01T10:12:54Z

Thank you very much for your interest in the project!

I think we first need a proper C-style wrapper of the model loading / encode and decode functionality / sampling strategies. After that we will easily create python and other language bindings. I've done similar work in my 'ggwave' project.

I agree that the encode and decode functionality should be exposed through the API as you suggested. It would give more flexibility to the users of the library/bindings.

aichr · 2022-10-04T09:09:31Z

@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch?

ggerganov · 2022-10-04T20:21:14Z

The initial API is now available on master:

https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h

The first part allows more fine-grained control over the inference and also allows the user to implement their own sampling strategy using the predicted probabilities for each token.

The second part of the API includes methods for full inference - you simply provide the audio samples and choose the sampling parameters.

Most likely the API will change with time, but this is a good starting point.

richardburleigh · 2022-10-09T13:14:50Z

This is as far as I got trying to get the API working in Python.

It loads the model successfully, but gets a segmentation fault on whisper_full.

Any ideas?

import ctypes
import pathlib

if __name__ == "__main__":
    libname = pathlib.Path().absolute() / "whisper.so"
    whisper = ctypes.CDLL(libname)
    modelpath = b"models/ggml-medium.bin"
    model = whisper.whisper_init(modelpath)
    params = whisper.whisper_full_default_params(b"WHISPER_DECODE_GREEDY")
    w = open('samples/jfk.wav', "rb").read()
    result = whisper.whisper_full(model, params, w, b"16000")
    # Segmentation fault

Edit - Got some debugging info from gdb but it didn't help much:
0x00007ffff67916c6 in log_mel_spectrogram(float const*, int, int, int, int, int, int, whisper_filters const&, whisper_mel&)

ggerganov · 2022-10-09T14:34:03Z

Here is one way to achieve this:

# build shared libwhisper.so
gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ whisper.cpp ggml.o -o libwhisper.so

Use it from Python like this:

import ctypes
import pathlib

# this is needed to read the WAV file properly
from scipy.io import wavfile

libname     = "libwhisper.so"
fname_model = "models/ggml-tiny.en.bin"
fname_wav   = "samples/jfk.wav"

# this needs to match the C struct in whisper.h
class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy",             ctypes.c_int),
        ("n_threads",            ctypes.c_int),
        ("offset_ms",            ctypes.c_int),
        ("translate",            ctypes.c_bool),
        ("no_context",           ctypes.c_bool),
        ("print_special_tokens", ctypes.c_bool),
        ("print_progress",       ctypes.c_bool),
        ("print_realtime",       ctypes.c_bool),
        ("print_timestamps",     ctypes.c_bool),
        ("language",             ctypes.c_char_p),
        ("greedy",               ctypes.c_int * 1),
    ]

if __name__ == "__main__":
    # load library and model
    libname = pathlib.Path().absolute() / libname
    whisper = ctypes.CDLL(libname)

    # tell Python what are the return types of the functions
    whisper.whisper_init.restype                  = ctypes.c_void_p
    whisper.whisper_full_default_params.restype   = WhisperFullParams
    whisper.whisper_full_get_segment_text.restype = ctypes.c_char_p

    # initialize whisper.cpp context
    ctx = whisper.whisper_init(fname_model.encode("utf-8"))

    # get default whisper parameters and adjust as needed
    params = whisper.whisper_full_default_params(0)
    params.print_realtime = True
    params.print_progress = False

    # load WAV file
    samplerate, data = wavfile.read(fname_wav)

    # convert to 32-bit float
    data = data.astype('float32')/32768.0

    # run the inference
    result = whisper.whisper_full(ctypes.c_void_p(ctx), params, data.ctypes.data_as(ctypes.POINTER(ctypes.c_float)), len(data))
    if result != 0:
        print("Error: {}".format(result))
        exit(1)

    # print results from Python
    print("\nResults from Python:\n")
    n_segments = whisper.whisper_full_n_segments(ctypes.c_void_p(ctx))
    for i in range(n_segments):
        t0  = whisper.whisper_full_get_segment_t0(ctypes.c_void_p(ctx), i)
        t1  = whisper.whisper_full_get_segment_t1(ctypes.c_void_p(ctx), i)
        txt = whisper.whisper_full_get_segment_text(ctypes.c_void_p(ctx), i)

        print(f"{t0/1000.0:.3f} - {t1/1000.0:.3f} : {txt.decode('utf-8')}")

    # free the memory
    whisper.whisper_free(ctypes.c_void_p(ctx))

richardburleigh · 2022-10-09T14:53:20Z

Thank you @ggerganov - really appreciate your work!

Still getting a seg fault with your code, but I'll assume it's a me problem:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
log_mel_spectrogram (samples=<optimized out>, n_samples=<optimized out>, sample_rate=<optimized out>, fft_size=<optimized out>, fft_step=<optimized out>, n_mel=80, n_threads=<optimized out>, filters=..., mel=...) at whisper.cpp:1977
1977	    mel.data.resize(mel.n_mel*mel.n_len);
(gdb) bt
#0  log_mel_spectrogram (samples=<optimized out>, n_samples=<optimized out>, sample_rate=<optimized out>, fft_size=<optimized out>, fft_step=<optimized out>, n_mel=80, n_threads=<optimized out>, filters=..., mel=...) at whisper.cpp:1977
#1  0x00007fffc28d24c7 in whisper_pcm_to_mel (ctx=0x560d7680, samples=0x7fffb3345010, n_samples=176000, n_threads=4) at whisper.cpp:2101
#2  0x00007fffc28d4113 in whisper_full (ctx=0x560d7680, params=..., samples=<optimized out>, n_samples=<optimized out>) at whisper.cpp:2316

richardburleigh · 2022-10-10T05:08:30Z

Got a segfault in the same place on an Intel 12th gen CPU and M1 Macbook with no changes to the above Python script. Anyone else tried it?

Were you using the same codebase as master @ggerganov ?

ggerganov · 2022-10-10T07:14:10Z

Yeah, the ctx pointer wasn't being passed properly. I've updated the python script above. Give it another try - I think it should work now.

pachacamac · 2022-10-15T13:14:28Z

Could you possibly make a binding to the stream program as well? Would be super cool to be able to register a callback once user speech is done and silence/non-speech is detected so the final text can be processed within python. This would allow for some really cool speech assistant like hacks.

richardburleigh · 2022-10-16T02:50:41Z

Could you possibly make a binding to the stream program as well? Would be super cool to be able to register a callback once user speech is done and silence/non-speech is detected so the final text can be processed within python. This would allow for some really cool speech assistant like hacks.

You can easily modify this script to use Whisper.cpp instead of DeepSpeech.

richardburleigh · 2022-10-16T08:49:01Z

@pachacamac I made a hacked together fork of Buzz which uses whisper.cpp

It's buggy and thrown together, but works.

Just make sure you build the shared library as libwhisper.so and put it in the project directory. There's no install package, so you'll need to run main.py directly.

Edit: I also made a simple stand-alone script using Whisper.cpp + Auditok (to detect voices)

ggerganov · 2022-10-18T15:22:02Z

Breaking changes in the C-api in last commit: e30cf83

chidiwilliams · 2022-10-29T08:05:11Z

I seem to be having some trouble making a shared lib on Windows (#9 (comment) works great on UNIX).

Using:

gcc -O3 -std=c11   -pthread -mavx -mavx2 -mfma -mf16c -fPIC -c ggml.c -o ggml.o
g++ -O3 -std=c++11 -pthread --shared -fPIC -static-libstdc++ -DWHISPER_SHARED -DWHISPER_BUILD whisper.cpp ggml.o -o libwhisper.so

And calling from Python as:

whisper_cpp = ctypes.CDLL("libwhisper.so")

# Calling any one of the functions errors
whisper_cpp.whisper_init('path/to/model.bin'.encode('utf-8'))
whisper_cpp.whisper_lang_id('en'.encode('utf-8'))

I get:

Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

...

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: stack overflow

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>
Windows fatal exception: access violation

Current thread 0x00002b30 (most recent call first):
  File "C:\Users\willi\Documents\src\buzz\whispercpp_test.py", line 17 in <module>

Ref: chidiwilliams/buzz#131

chidiwilliams · 2022-11-03T08:17:10Z

@ggerganov thanks for all your help so far. I seem to be having an issue with the Python binding (similar to one you posted, not Windows).

class WhisperFullParams(ctypes.Structure):
    _fields_ = [
        ("strategy",             ctypes.c_int),
        ("n_threads",            ctypes.c_int),
        ("offset_ms",            ctypes.c_int),
        ("translate",            ctypes.c_bool),
        ("no_context",           ctypes.c_bool),
        ("print_special_tokens", ctypes.c_bool),
        ("print_progress",       ctypes.c_bool),
        ("print_realtime",       ctypes.c_bool),
        ("print_timestamps",     ctypes.c_bool),
        ("language",             ctypes.c_char_p),
        ("greedy",               ctypes.c_int * 1),
    ]


model_path = 'ggml-model-whisper-tiny.bin'
audio_path = './whisper.cpp/samples/jfk.wav'
libname = './whisper.cpp/libwhisper.dylib'

whisper_cpp = ctypes.CDLL(
    str(pathlib.Path().absolute() / libname))

whisper_cpp.whisper_init.restype = ctypes.c_void_p
whisper_cpp.whisper_full_default_params.restype = WhisperFullParams
whisper_cpp.whisper_full_get_segment_text.restype = ctypes.c_char_p

ctx = whisper_cpp.whisper_init(model_path.encode('utf-8'))

params = whisper_cpp.whisper_full_default_params(0)
params.print_realtime = True
params.print_progress = True


samplerate, audio = wavfile.read(audio_path)
audio = audio.astype('float32')/32768.0


result = whisper_cpp.whisper_full(
    ctypes.c_void_p(ctx), params, audio.ctypes.data_as(
        ctypes.POINTER(ctypes.c_float)), len(audio))
if result != 0:
    raise Exception(f'Error from whisper.cpp: {result}')


n_segments = whisper_cpp.whisper_full_n_segments(
    ctypes.c_void_p(ctx))
print(f'n_segments: {n_segments}')

Prints:

whisper_model_load: loading model from 'ggml-model-whisper-tiny.bin'
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem_required  = 476.00 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: ggml ctx size =  73.58 MB
whisper_model_load: memory size =    11.41 MB
whisper_model_load: model size  =    73.54 MB
176000, length of samples
log_mel_spectrogram: n_samples = 176000, n_len = 1100
log_mel_spectrogram: recording length: 11.000000 s
length of spectrogram is less than 1s
n_segments: 0

I added an extra log line to show that whisper_full exits due to the length of the spectrogram being less than 1. I see the same issue with other audio files I try as well as when I read the audio sample using whisper.audio.load_audio

ggerganov · 2022-11-03T08:22:50Z

The WhisperFullParams struct has been updated since I posted, so you have to match the new struct in the whisper.h.
Ideally, the python bindings should be automatically generated based on the C API in order to avoid this kind of issues.

chidiwilliams · 2022-11-04T07:20:59Z

Of course. Thanks a lot!

thakurudit · 2022-12-05T11:43:18Z

Of course. Thanks a lot!

@chidiwilliams
Did it work for you?

chidiwilliams · 2022-12-06T08:57:55Z

@thakurudit Yes, it did. I use ctypesgen to generate bindings for Buzz.

limdongjin · 2023-02-28T05:25:55Z

My Python Binding:

ver1. using cythonize
https://github.com/limdongjin/whisper.cpp.py/tree/main/whisper.cpp.cython

ver2. using ctypes.CDLL
https://github.com/limdongjin/whisper.cpp.py/tree/main/whisper_cpp_cdll

mrmachine · 2023-04-07T13:27:05Z

Here is the binding https://github.com/aarnphm/whispercpp cc @ggerganov

How can I make this work? I've cloned this whisper.cpp repo and run make main and make stream. I've made a virtualenv and installed whispercpp. When I try to run the stream.py example, I get:

Traceback (most recent call last):
  File "stream.py", line 44, in <module>
    default=w.api.SAMPLE_RATE,
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 144, in __getattr__
    self._module = self._load()
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 122, in _load
    module = importlib.import_module(self.__name__)
  File "/Users/tailee/.pyenv/versions/3.8.16/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dlopen(/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/api_cpp2py_export.so, 0x0002): symbol not found in flat namespace '_PyCMethod_New'

Do I need to make and install some shared libraries somewhere? If so, I could not find any instructions for this in this thread or the whisper.cpp or whispercpp docs.

aarnphm · 2023-04-08T01:29:41Z

Here is the binding aarnphm/whispercpp cc @ggerganov

How can I make this work? I've cloned this whisper.cpp repo and run make main and make stream. I've made a virtualenv and installed whispercpp. When I try to run the stream.py example, I get:

Traceback (most recent call last):
  File "stream.py", line 44, in <module>
    default=w.api.SAMPLE_RATE,
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 144, in __getattr__
    self._module = self._load()
  File "/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/utils.py", line 122, in _load
    module = importlib.import_module(self.__name__)
  File "/Users/tailee/.pyenv/versions/3.8.16/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: dlopen(/Users/tailee/Projects/whisper.cpp/venv/lib/python3.8/site-packages/whispercpp/api_cpp2py_export.so, 0x0002): symbol not found in flat namespace '_PyCMethod_New'

Do I need to make and install some shared libraries somewhere? If so, I could not find any instructions for this in this thread or the whisper.cpp or whispercpp docs.

Hey there, let's bring this to the main repo to avoid polluting this thread.

janhuenermann · 2023-04-16T13:40:51Z

Hey everyone, I also created simple Python bindings using pybind11. In case anyone is interested, you can install them:

pip install git+https://github.com/janhuenermann/whisper.cpp.git@pybind#subdirectory=bindings/python

To transcribe audio, run:

import pywhisper
pywhisper.init(model_path="./models/ggml-base.en.bin")
audio_pcmf32_16khz_numpy = ...
transcription = pywhisper.transcribe(audio_pcmf32_16khz_numpy)
print(transcription)
# [(0.0, 11.0, ' And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.')]

For more details, I have a readme and simple example here: https://github.com/janhuenermann/whisper.cpp/tree/pybind/bindings/python

If this is of interest to more people, I'm happy to open a PR.

DoodleBears · 2023-04-26T17:18:51Z

Another one using pybind11 pywhispercpp

pajowu · 2023-04-26T17:54:17Z

And another one using pybind11 i created for @transcribee: https://github.com/pajowu/whispercppy . I also published it into pypi (with wheels and everything): https://pypi.org/project/whispercppy/

This one is a combination of the bindings that @aarnphm create and build tooling taken from and heavily modified from @janhuenermann s fork. This should give you easy installation (wheels where possible and only cmake and clang/gcc as a dependency otherwise) and good bindings. It can even yield generated paragraphs async while they are transcribed, as shown in https://github.com/transcribee/transcribee/blob/aaaa373fa90024bad6e4053b469ab7352b5c503c/worker/transcribee_worker/whisper_transcribe.py#L165

carloscdias · 2023-05-09T15:23:17Z

Most python bindings I found in the last week were outdated or breaking with the current API, so I made a project (https://github.com/carloscdias/whisper-cpp-python) following the same pattern in ggerganov original answer and also followed his suggestion on providing a way to automatically generate the python bindings from whisper.h , I plan to provide an interface compatible with official whisper clients, similar to what was done in https://github.com/abetlen/llama-cpp-python , for my own use, but if it proves useful for anyone else, feel free to give it a try

silvacarl2 · 2023-05-09T15:37:20Z

thank you!! checking it out!

hoonlight · 2023-05-21T14:41:12Z

Most python bindings I found in the last week were outdated or breaking with the current API, so I made a project (https://github.com/carloscdias/whisper-cpp-python) following the same pattern in ggerganov original answer and also followed his suggestion on providing a way to automatically generate the python bindings from whisper.h , I plan to provide an interface compatible with official whisper clients, similar to what was done in https://github.com/abetlen/llama-cpp-python , for my own use, but if it proves useful for anyone else, feel free to give it a try

That's great! This should be added to https://github.com/ggerganov/whisper.cpp#bindings.

benniekiss · 2023-10-15T21:46:35Z

Looking around at the available python bindings, none currently seem to support the latest branch of whisper.cpp with GPU acceleration for cuda or metal. Does anyone have a working version? A lot has changed in whisper.cpp, and it seems most of the python bindings are based on an older version that lacks a lot of the more recent functions.

albcunha · 2023-11-10T22:50:47Z

Looking around at the available python bindings, none currently seem to support the latest branch of whisper.cpp with GPU acceleration for cuda or metal. Does anyone have a working version? A lot has changed in whisper.cpp, and it seems most of the python bindings are based on an older version that lacks a lot of the more recent functions.

I´m on the same boat. I can run whisper.cpp with rocm on the cli, but I keep gettint segmentation fault or other type of crash on all wrappers I saw.
The only that that was a complete failure was the code from synesthesiam above. But i just returned empty for me. I checked and recheckes whisper_full_params, which seem differente on the rocm build, but it does not work.

It´s now whisper.cpp fault. Let´s hope someone comes up with help.

albcunha · 2023-11-11T17:02:05Z

Just to give some feedback, I wanted to try whisper.cpp because I'm using an amd rx 5700 xt with 8gb vram. I wanted to use whisper large model. I endeded using hugging face transformers and i could fit this model on the gpu.

dnhkng · 2023-11-16T19:14:34Z

Disappointing there are so many unmaintained Python bindings.

Update: As I needed this on CUDA, I've tried to fix it myself. Seems to work OK on Ubuntu+Nvidia GPU, but not yet on Mac. Please test on Windows and report back!
Code is on my PR: #1524

chrisspen · 2024-01-08T21:34:22Z

Unmaintained is putting it mildly. Even the ones that work don't work well. Every single Python binding of the C++ implementation I've tested is significantly slower than the pure-Python version, which is mind boggling.

Terrible implementations like whisper-cpp-python, which doesn't even publish its code anywhere, takes 5 minutes to transcribe a 10 second file that the pure Python implementation can handle in a few seconds using the same large model...

dnhkng · 2024-01-09T05:40:01Z

@chrisspen Did you try my PR? I'm using it for a real-time LLM chatbot. Using distil-whisper, I can get Voice->text and then text->voice in a few hundred ms.

chrisspen · 2024-01-09T13:33:00Z

@dnhkng Yes. The problem seems to be the C++ code. Might work fine with a gpu, but on cpu, it runs slower than pure Python on a 10 year old machine. And if C++ needs an expensive gpu to be faster than Python on a cpu, it's not good code.

I'm finding faster_whisper is much more usable for Python and far more cost effective.

egfthomas · 2024-01-17T11:49:57Z

@ArtyomZemlyak First you reinvent the pytorch functions in c, then you want python bindings around them. Isn't the end result the same as what we have in pytorch?

Is there a streaming function in the original python/pytorch implementation ?

SeeknnDestroy · 2024-04-26T18:17:13Z

@dnhkng Yes. The problem seems to be the C++ code. Might work fine with a gpu, but on cpu, it runs slower than pure Python on a 10 year old machine. And if C++ needs an expensive gpu to be faster than Python on a cpu, it's not good code.

I'm finding faster_whisper is much more usable for Python and far more cost effective.

can I use faster_whisper for real time transcription tasks?

chrisspen · 2024-04-26T19:27:51Z

@SeeknnDestroy

can I use faster_whisper for real time transcription tasks?

Probably not. faster_whisper is a lot faster than the pure Python implementation, but a lot slower than this C++ version.

I'd only recommend faster_whisper when you want good performance but don't have a GPU needed to run whisper.cpp.

ggerganov mentioned this issue Oct 4, 2022

Initial C-style interface for whisper.cpp #19

Merged

ggerganov added build Build related issues enhancement New feature or request labels Oct 5, 2022

ggerganov added a commit that referenced this issue Oct 8, 2022

ref #9 : add API documentation in whisper.h

9bbca31

ggerganov pinned this issue Oct 9, 2022

pachacamac mentioned this issue Oct 15, 2022

Support for realtime audio input #10

Closed

ggerganov changed the title ~~Python bindings~~ Python bindings (C-style API) Oct 22, 2022

JarbasAl mentioned this issue Oct 24, 2022

proper python bindings OpenVoiceOS/ovos-stt-plugin-whispercpp#2

Closed

chidiwilliams mentioned this issue Oct 29, 2022

Windows build #5

Closed

ggerganov mentioned this issue Nov 24, 2022

Pybind11 Issues #180

Open

cjia4 mentioned this issue Mar 28, 2023

ggml_new_tensor_impl: not enough space in the scratch memory #671

Closed

desh2608 mentioned this issue Apr 6, 2023

Multi-channel input for annotate_with_whisper lhotse-speech/lhotse#865

Open

mrmachine mentioned this issue Apr 8, 2023

Docs don't explain if additional whisper.cpp build/install step is required? aarnphm/whispercpp#91

Open

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023

ref ggerganov#9 : add API documentation in whisper.h

34d36a4

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023

readme : add cython bindings (ggerganov#9)

2a67424

pikalover6 mentioned this issue Apr 30, 2023

Calling ggml from a Python script ggerganov/ggml#120

Open

warkcod mentioned this issue Jun 8, 2023

OpenCL clCreateCommandQueue error -30 on MacOS 13.4 intel #996

Open

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023

ref ggerganov#9 : add API documentation in whisper.h

72aebcc

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023

readme : add cython bindings (ggerganov#9)

5e19268

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023

readme : add cython bindings (ggerganov#9)

4afe891

landtanin pushed a commit to landtanin/whisper.cpp that referenced this issue Dec 16, 2023

readme : add cython bindings (ggerganov#9)

4837f73

Python bindings (C-style API) #9

Python bindings (C-style API) #9

Comments

ArtyomZemlyak commented Oct 1, 2022

ArtyomZemlyak commented Oct 1, 2022

Building

Run main

ArtyomZemlyak commented Oct 1, 2022

ggerganov commented Oct 1, 2022

aichr commented Oct 4, 2022

ggerganov commented Oct 4, 2022

richardburleigh commented Oct 9, 2022 • edited

ggerganov commented Oct 9, 2022 • edited

richardburleigh commented Oct 9, 2022 • edited

richardburleigh commented Oct 10, 2022

ggerganov commented Oct 10, 2022

pachacamac commented Oct 15, 2022

richardburleigh commented Oct 16, 2022

richardburleigh commented Oct 16, 2022 • edited

ggerganov commented Oct 18, 2022

chidiwilliams commented Oct 29, 2022

chidiwilliams commented Nov 3, 2022 • edited

ggerganov commented Nov 3, 2022

chidiwilliams commented Nov 4, 2022

thakurudit commented Dec 5, 2022

chidiwilliams commented Dec 6, 2022

limdongjin commented Feb 28, 2023

mrmachine commented Apr 7, 2023

aarnphm commented Apr 8, 2023

janhuenermann commented Apr 16, 2023 • edited

DoodleBears commented Apr 26, 2023

pajowu commented Apr 26, 2023

carloscdias commented May 9, 2023

silvacarl2 commented May 9, 2023

hoonlight commented May 21, 2023

benniekiss commented Oct 15, 2023

albcunha commented Nov 10, 2023

albcunha commented Nov 11, 2023

dnhkng commented Nov 16, 2023 • edited

chrisspen commented Jan 8, 2024 • edited

dnhkng commented Jan 9, 2024

chrisspen commented Jan 9, 2024

egfthomas commented Jan 17, 2024

SeeknnDestroy commented Apr 26, 2024

chrisspen commented Apr 26, 2024

richardburleigh commented Oct 9, 2022 •

edited

ggerganov commented Oct 9, 2022 •

edited

richardburleigh commented Oct 9, 2022 •

edited

richardburleigh commented Oct 16, 2022 •

edited

chidiwilliams commented Nov 3, 2022 •

edited

janhuenermann commented Apr 16, 2023 •

edited

dnhkng commented Nov 16, 2023 •

edited

chrisspen commented Jan 8, 2024 •

edited