Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/ksponspeech #2510

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Conversation

ddwkim
Copy link
Contributor

@ddwkim ddwkim commented Apr 16, 2024

What does this PR do?

This updates KsponSpeech recipe according to the recent changes in Speechbrain.

Not only conformer-medium model is updated, this PR includes new models, such as conformer-small, branchformer-medium, which can serve as new benchmarks for the dataset and models.

Tokenizers and transformer language model used previously trained ones.

Other updates include metadata build script using multiprocessing, and unzip script that follows recent KsponSpeech directory structure updates.

These are all CTC-Attention models, and training process of conformer-transducer model is underway for streaming cases.

We are consistently seeing the demand for KsponSpeech models from the community, as can be seen in issues and model downloads from the huggingface hub.

I believe the continuous updates of the KsponSpeech models will benefit the research community and the industry as a whole.

There are several other things that should be accomplished before this PR is completed.

  • Update huggingface speechbrain models. Models in my repository are already updated, and hence simply copying them to the speechbrain hub will be enough
  • Upload train logs to speechbrain dropbox
  • Update Colab and documentations to meet the current changes
Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@Adel-Moumen Adel-Moumen self-requested a review April 17, 2024 07:56
@ddwkim ddwkim force-pushed the feature/ksponspeech branch 2 times, most recently from 5ea0b88 to 74eba89 Compare April 18, 2024 00:44
@ddwkim
Copy link
Contributor Author

ddwkim commented Apr 18, 2024

I will fix the CI error as soon as possible.

@ddwkim ddwkim force-pushed the feature/ksponspeech branch 2 times, most recently from bf01589 to 0ef03ee Compare April 18, 2024 01:17
@Adel-Moumen Adel-Moumen added enhancement New feature or request recipes Changes to recipes only (add/edit) labels Apr 18, 2024
"""

import csv
import logging
import os
import re
from multiprocessing import Pool, cpu_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just updated the script accordingly

| 01-23-23 | conformer_medium.yaml | 20.47% | 25.18% | 7.33% | 7.99% | [HuggingFace](https://huggingface.co/speechbrain/asr-conformer-transformerlm-ksponspeech) | [DropBox](https://www.dropbox.com/sh/uibokbz83o8ybv3/AACtO5U7mUbu_XhtcoOphAjza?dl=0) | 6xA100 80GB | 2 days 13 hours |
| 04-16-24 | conformer_medium.yaml | 20.15% | 24.75% | 7.40% | 7.96% | [HuggingFace](https://huggingface.co/ddwkim/asr-conformer-transformerlm-ksponspeech) | [DropBox](https://www.dropbox.com/sh/uibokbz83o8ybv3/AACtO5U7mUbu_XhtcoOphAjza?dl=0) | 2xA100 40GB | 14 hours 7 mins |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit curious but do you know why we went from 2days and 13 hours of training using 6x A100 to "only" 14 hours and 7 mins of training using 2x A100 40GB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed there may have been optimizations in the speechbrain train pipeline and higher torch versions, which I could not enjoy in mid 2022, the year I trained the model last time. I will share the train logs soon, and share findings if there is anything found during inspection of code changes between now and then.

@Adel-Moumen Adel-Moumen added this to the v1.0.2 milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request recipes Changes to recipes only (add/edit)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants