Using both the GPU and CPU simultaneously #1570

paralin · 2023-11-28T21:05:12Z

paralin
Nov 28, 2023

I'm using whisper.cpp with great results on an AMD GPU:

make WHISPER_CLBLAST=1 -j8
./models/download-ggml-model.sh large-v3
./main -pc -otxt -t 6 -p 1 -of transcript -m ./models/ggml-large-v3.bin ./audio.wav

My questions are:

What should I set threads to? Right now I have 6 since I have an 8 core CPU and this seems to be the sweet spot.
Is it possible to use both the CPU and GPU at the same time? The CPU seems to be saturated, is it using the GPU? Or both?

A bit confused on what the optimal flags to use are for this case: AMD GPU and an Intel CPU.

Thanks!

Answered by bobqianic

Dec 7, 2023

Is it possible to use both the CPU and GPU at the same time?

It could become feasible in the future, once the scheduler is fully implemented in the ggml backend.

The CPU seems to be saturated, is it using the GPU? Or both?

Starting from version 1.5.0, the majority of the graph processing has been shifted to the GPU. As a result, the CPU threads spend most of their time idle, simply waiting for data from the GPU.

What should I set threads to?

In the latest version of whisper.cpp, the CPU mainly performs two functions. First, it processes the log-mel spectrogram and then determines the most suitable next token based on the model's output during sampling. For shorter audio files, setti…

View full answer

bradmit · 2023-12-01T03:45:29Z

bradmit
Dec 1, 2023

I have it set to 4 threads when using CPU or GPU, but I don't think it has much impact at all in the case of GPU. You could try doing some timings to see perhaps? With the release of 1.5.0 and the updated support for CUDA, the CPU usage is minimal. I don't think there is a way to use both CPU and GPU for operations. It's one or the other with the lib/program.

0 replies

bobqianic · 2023-12-07T00:09:54Z

bobqianic
Dec 7, 2023
Collaborator

Is it possible to use both the CPU and GPU at the same time?

It could become feasible in the future, once the scheduler is fully implemented in the ggml backend.

The CPU seems to be saturated, is it using the GPU? Or both?

Starting from version 1.5.0, the majority of the graph processing has been shifted to the GPU. As a result, the CPU threads spend most of their time idle, simply waiting for data from the GPU.

What should I set threads to?

In the latest version of whisper.cpp, the CPU mainly performs two functions. First, it processes the log-mel spectrogram and then determines the most suitable next token based on the model's output during sampling. For shorter audio files, setting the thread count to 1 is quite effective. On the other hand, for longer audio, increasing the number of threads can enhance the end-to-end transcription speed.

2 replies

bobqianic Dec 7, 2023
Collaborator

Ah, it's an AMD GPU... I need to verify my statements.

paralin Dec 7, 2023
Author

Thanks for the answer, nevertheless.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using both the GPU and CPU simultaneously #1570

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using both the GPU and CPU simultaneously #1570

paralin Nov 28, 2023

Replies: 2 comments · 2 replies

bradmit Dec 1, 2023

bobqianic Dec 7, 2023 Collaborator

bobqianic Dec 7, 2023 Collaborator

paralin Dec 7, 2023 Author

paralin
Nov 28, 2023

Replies: 2 comments 2 replies

bradmit
Dec 1, 2023

bobqianic
Dec 7, 2023
Collaborator

bobqianic Dec 7, 2023
Collaborator

paralin Dec 7, 2023
Author