Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpeechT5 integration #2441

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from
Open

SpeechT5 integration #2441

wants to merge 22 commits into from

Conversation

helleuch
Copy link

@helleuch helleuch commented Feb 29, 2024

What does this PR do?

The goal of this PR is to integrate the SpeechT5 model for speech to text into SpeechBrain.
It also comes with a recipe for Tamasheq to French automatic speech translation under the IWSLT22 directory.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@helleuch helleuch changed the title Speech t5 SpeechT5 integration Mar 1, 2024
@helleuch
Copy link
Author

helleuch commented Mar 6, 2024

Hello,
I have a question regarding the submission of the results obtained by running the submitted recipe. Do I have to upload the results to DropBox myself ? Or are recipes ran by the reviewers and then have the obtained results uploaded ?

@Adel-Moumen
Copy link
Collaborator

Hello, I have a question regarding the submission of the results obtained by running the submitted recipe. Do I have to upload the results to DropBox myself ? Or are recipes ran by the reviewers and then have the obtained results uploaded ?

Hello @helleuch, thanks for your PR! You'll need to upload them on a cloud storage so that we can download the ckpts and upload them on our official dropbox.

@helleuch
Copy link
Author

helleuch commented Mar 8, 2024

Thank you @Adel-Moumen. I will upload them soon and send you the link :)
I will update the results in the readme file.

Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Thanks a lot for this PR. This is generally speaking a great and very nice addition. I left some comments that I think you be addressed. I also put in comment the link to the dropbox. Could you please update the README so that it reflect to test results obtained?

Could you please confirm me if you ran the recipe test on this PR?

Thanks a lot :)

p.s. make sure to fix the failing pre-commit... :)

Comment on lines 222 to 234
def forward_decoder(self, audio_features, decoder_input_ids):
"""Perform one step of the SpeechT5 decoder.

Arguments
---------
audio_features : torch.Tensor
A batch of audio features (SpeechT5 encoding).
decoder_input_ids : torch.Tensor
A batch of decoder inputs tokens.

For more details or go to theseq2seq2.py file in SpeechBrain to see how to generate
the tokens with Greedy Search and/or Beam Search.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im wondering but is there kv cache support on this model with HF ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know, sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants