Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that there is no performance gain utilizing Core ML #2057

Open
MichelBahl opened this issue Apr 15, 2024 · 2 comments
Open

It seems that there is no performance gain utilizing Core ML #2057

MichelBahl opened this issue Apr 15, 2024 · 2 comments

Comments

@MichelBahl
Copy link

I think Core ML is setup correct:

Start whisper.cpp with:

./main --language de -t 10 -m models/ggml-medium.bin -f

whisper_init_state: loading Core ML model from 'models/ggml-medium-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     6.78 MiB, ( 1738.41 / 49152.00)
whisper_init_state: compute buffer (conv)   =    8.81 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     5.86 MiB, ( 1744.27 / 49152.00)
whisper_init_state: compute buffer (cross)  =    7.85 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   130.83 MiB, ( 1875.09 / 49152.00)
whisper_init_state: compute buffer (decode) =  138.87 MB

system_info: n_threads = 10 / 12 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0

main: processing '/Users/michaelbahl/Downloads/testcast.wav' (8126607 samples, 507.9 sec), 10 threads, 1 processors, 5 beams + best of 5, lang = de, task = transcribe, timestamps = 1 ...

Runtime (COREML):

whisper_print_timings:     load time =   442.59 ms
whisper_print_timings:     fallbacks =   1 p /   0 h
whisper_print_timings:      mel time =   140.54 ms
whisper_print_timings:   sample time = 13079.59 ms / 12370 runs (    1.06 ms per run)
whisper_print_timings:   encode time =  6931.83 ms /    21 runs (  330.09 ms per run)
whisper_print_timings:   decode time =   273.79 ms /    27 runs (   10.14 ms per run)
whisper_print_timings:   batchd time = 52941.25 ms / 12239 runs (    4.33 ms per run)
whisper_print_timings:   prompt time =  1136.64 ms /  4434 runs (    0.26 ms per run)
whisper_print_timings:    total time = 75668.75 ms
ggml_metal_free: deallocating
ggml_metal_free: deallocating

Runtime (normal):

whisper_print_timings:     load time =   548.92 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   144.93 ms
whisper_print_timings:   sample time = 12857.83 ms / 12239 runs (    1.05 ms per run)
whisper_print_timings:   encode time =  5827.67 ms /    21 runs (  277.51 ms per run)
whisper_print_timings:   decode time =   572.82 ms /    58 runs (    9.88 ms per run)
whisper_print_timings:   batchd time = 52036.77 ms / 12079 runs (    4.31 ms per run)
whisper_print_timings:   prompt time =  1132.30 ms /  4434 runs (    0.26 ms per run)
whisper_print_timings:    total time = 73148.27 ms

Did I miss something for an faster transcription?

@ggerganov
Copy link
Owner

Depending on your hardware (GPU cores / ANE cores), Core ML might or might not be faster:

#1722 (comment)

@ggerganov
Copy link
Owner

Also, try to generate ANE-optimized Core ML models - this can result in extra improvement:

#1716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants