Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard output while processing. #123

Open
quist00 opened this issue Apr 22, 2024 · 4 comments
Open

Standard output while processing. #123

quist00 opened this issue Apr 22, 2024 · 4 comments
Labels
enhancement Improves existing code good first issue Good for newcomers

Comments

@quist00
Copy link

quist00 commented Apr 22, 2024

Both the original and whisper.cpp dump the processing to standard out by default. Whisperkit seems silent till the end, and verbose flag seems to be outputting much lower level information. Assuming I just didn't read the documentation correctly, please consider dumpling to standard output by default. Where there is substantial performance differences, it is easy to see just by testing against the same file allowing simple gestalt comparison of the different implementations and different model within the same implementation.

@ZachNagengast
Copy link
Contributor

We do have different log levels, sounds like you're interested in logLevel: .info rather than debug? For the CLI this is hardcoded at the moment, so we can add this as a new CLI argument. Anything specific you'd especially like to see in the info logs?

@ZachNagengast ZachNagengast added enhancement Improves existing code good first issue Good for newcomers labels Apr 22, 2024
@atiorh
Copy link
Contributor

atiorh commented Apr 22, 2024

@quist00 Adding to Zach's point, if you are interested in a streaming application (as opposed to offlline processing of a file) and want to test/emulate the streaming performance on a file, you can use --stream-simulated in the CLI.

@quist00
Copy link
Author

quist00 commented Apr 22, 2024

It would be great if that could be added as a flag to the CLI. Streaming applications is not something we are really looking at currently. I work at a library and we want to use whisper internally to drastically reduce the time and expenditure to transcribe / translate items for oral history projects. I and many of my colleagues have Apple Silicon, so I really appreciate you all working on options for us that work more efficiently. I want to share it with other researchers around campus who also may have dozens or hundreds of hours of audio to contend with, so command line will really be the best options for most of them rather than a programmatic API approach given they are not programmers in most cases nor do they have any on staff.

As far as the output, I think the time stamps along with chunks of text as it goes is best. That way, novice users can get rough estimates of if I use this model with whisperkit, then I can estimate that I will get x minutes of output for a minute of processing. They can then grade the output and determine what is the right tradeoff of model verse processing time.

Thanks for you consideration.

@ZachNagengast
Copy link
Contributor

@quist00 Could you perhaps give an example of the input/output pairs you're looking for? That way we can build toward a CLI flag that would result in an acceptable output for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves existing code good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants