The Whisper large-v3 model exported to ONNX does not return the end timestamp for the last chunk #1850
Open
2 of 4 tasks
Labels
bug
Something isn't working
System Info
Who can help?
@JingyaHuang @echarlaix @michaelbenayoun
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Running the transformers pipeline for the original whisper-large-v3 model returns the correct timestamps of all chunks for files of any duration. For example, for a file with a duration of 33 seconds, the code below gives the following result:
Output
After converting to onnx using this command:
and running the equivalent code:
The transcript contains None instead of the end timestamp for the last chunk, although there is no word cut off in the middle:
Output
In the case of files with a duration of less than 30 seconds, the ONNX model does not return timestamps at all:
Expected:
So now there are 2 problems:
Audio files:
audio_files.zip
Expected behavior
The code which uses the ONNX model should work the same as the version using the pytorch model.
The text was updated successfully, but these errors were encountered: