You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to Figure 1 in the whisper paper, during training, the previous text tokens does not contain timestamp tokens. However, using trascribe with without_timestamps=False and condition_on_previous_text=True, the prompt tokens (which contains previous text) get passed into model with the timestamp tokens. Print out prompt right before this line to confirm:
If this is indeed true, wouldn't it cause train-test mismatch?
Timestamp tokens had never appeared before <|startoftranscript|> during training, but now they do in inference.
Timestamp tokens are only from the range [0, 30]. It does not make sense for both the previous and the current segments to have timestamps within [0, 30]. For example, Previous segment:
<|startofprev|><|0.00|>This is sentence 1.<|1.02|>...<|29.02|>This is another.<|29.54|>
Current transcript:
<|startoftranscript|><|0.02|>This is another.<|1.04|>...
As I understand the openai implementation also did this, so I opened a discussion there as well: openai/whisper#2140
The text was updated successfully, but these errors were encountered:
According to Figure 1 in the whisper paper, during training, the previous text tokens does not contain timestamp tokens. However, using trascribe with
without_timestamps=False
andcondition_on_previous_text=True
, the prompt tokens (which contains previous text) get passed into model with the timestamp tokens. Print outprompt
right before this line to confirm:faster-whisper/faster_whisper/transcribe.py
Line 850 in 91c8307
There were several timestamp tokens in there.
If this is indeed true, wouldn't it cause train-test mismatch?
Current transcript:
As I understand the openai implementation also did this, so I opened a discussion there as well: openai/whisper#2140
The text was updated successfully, but these errors were encountered: