Whisper: fix asr pipeline with seq2seq assistant model #30726

gante · 2024-05-09T12:10:21Z

What does this PR do?

Fixes #29869
Fixes #30407
Fixes #30611 (related PR: #30637)

The ASR pipeline was preparing the encoder outputs before generate (and not passing input_features), but that's not needed: the exact same preparation is done inside generate, as it is a hard requirement to generate with encoder-decoder models.

However, by not passing input_features, it was blocking the proper use of encoder-decoder assistants that used a different encoder output shape (= decoder input shape), and thus needed the inputs to run their own encoding step.

Contrarily to #30637, this fix relies on lowering the complexity of our codebase 👼

HuggingFaceDocBuilderDev · 2024-05-09T12:53:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kamilakesbi

Hi @gante,

We pass the encoder output directly to generate when doing short form generation with Whisper, but it doesn't seem necessary indeed! @sanchit-gandhi WDYT ?

I can add the code modifications to #30637.

gante · 2024-05-20T11:04:56Z

(diff included in #30637 )

fix asr pipeline with seq2seq assistant model

0f38577

gante mentioned this pull request May 9, 2024

Using assistant in AutomaticSpeechRecognitionPipeline with different encoder size #30637

Open

gante requested review from kamilakesbi and sanchit-gandhi May 9, 2024 12:11

gante added the run-slow label May 9, 2024

gante added 3 commits May 9, 2024 12:12

[run slow] whisper

705b2e8

fix wav2vec pipeline test

0c90dc5

[run slow] whisper

c1c2f43

kamilakesbi approved these changes May 10, 2024

View reviewed changes

gante closed this May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper: fix asr pipeline with seq2seq assistant model #30726

Whisper: fix asr pipeline with seq2seq assistant model #30726

gante commented May 9, 2024

HuggingFaceDocBuilderDev commented May 9, 2024

kamilakesbi left a comment •

edited

gante commented May 20, 2024

Whisper: fix asr pipeline with seq2seq assistant model #30726

Whisper: fix asr pipeline with seq2seq assistant model #30726

Conversation

gante commented May 9, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented May 9, 2024

kamilakesbi left a comment • edited

Choose a reason for hiding this comment

gante commented May 20, 2024

kamilakesbi left a comment •

edited