Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribed audio not showing #35

Open
iamlokeshvunnam opened this issue Jul 11, 2023 · 6 comments
Open

Transcribed audio not showing #35

iamlokeshvunnam opened this issue Jul 11, 2023 · 6 comments

Comments

@iamlokeshvunnam
Copy link

Thanks a lot for this repo!

I cloned the Cheetah repo, and cloned the whisper.cpp repo (both at the same level). I followed the instructions to download the ggml-model. I brew installed sdl2 and also installed blackhole, and when I run the Cheetah project, I am not able to see the transcribed text on the window.

When running the project for the first time, it couldn't find the model at this location ( '/Users//Library/Caches/org.phrack.Cheetah/ggml-medium.en.bin'), so I used the downloaded ggml-model which was inside whisper.cpp/models/ and put it in the above file location for Cheetah to be able to find the model.

Screenshot 2023-07-12 at 00 20 13 Screenshot 2023-07-12 at 00 21 02

What am I missing? Do I need to be in a meeting with someone to test this out? What should the source of the input be? Please clarify these! Thanks a lot!

@leetcode-mafia
Copy link
Owner

leetcode-mafia commented Jul 12, 2023

It looks like you might be running in debug mode and/or with a debugger attached? If so, that won't work because Whisper runs too slowly without compiler optimizations.

If that is not the issue, then the BLANK_AUDIO tokens suggest that the app is receiving audio but you have no input devices sending audio to BlackHole. Before trying to get BlackHole to work, you'll want to make sure Cheetah works with the built-in mic input.

@iamlokeshvunnam
Copy link
Author

I'm not really sure if I've the debug mode on, because as soon as the Xcodeproj opened up, I clicked on the 'play button' which I'm thinking should run the application in the release mode, or not?

Please help, thanks!

@leetcode-mafia
Copy link
Owner

No, that will run it in debug mode. You're looking for: Product > Build for > Profiling, then Product > Show Build Folder.

@iamlokeshvunnam
Copy link
Author

Oh, thanks a lot. I've now built the application, and running it in the release mode. I still don't see any transcriptions (input source is set to 'MacBook Air Microphone'). Any reason why?

@iamlokeshvunnam
Copy link
Author

Yep, I found the reason. The model that was put in the cache folder is corrputed somehow, replacing it with the ggml model that exists in the whisper.cpp did the trick. It works, however, I still have to try and make the blackhole to work (I'd appreciate your help on this one).

Thanks a lot. Also, the transcriptions seem a bit slow, what would your advice be to have faster transcriptions? Have you tried using the quantized models, do they produce reasonably good transcriptions? Any other suggestions?

@leetcode-mafia
Copy link
Owner

What hardware are you using? It can only run fast enough on an M1 or M2.

Even with beefy hardware, there's still a minor delay in generating transcriptions. The main reason is that the current algorithm for buffering/chunking the audio stream isn't optimal and needs further tuning.

M1/M2 is fast enough to run the medium model in real-time, so I don't think using quantized or smaller models would make a significant difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants