TTS Eval: Add TTS evaluation (MOS estimation) #2392

flexthink · 2024-02-06T04:00:05Z

What does this PR do?

Add TTS evaluation models trained on the SOMOS dataset
There should be no breaking changes

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

mravanelli · 2024-02-11T22:27:26Z

Thank you @flexthink for your contribution! Having a model for MOS estimation is valuable for SpeechBrain. Here are some comments following an initial code inspection:

The README.md file is missing.
Recipe tests are not currently implemented.
There seems to be an implementation of multiple systems; could we focus on integrating only the best one to help maintenance?
We need to put the logs in the official Speechbrain Dropbox.
The best model needs to be uploaded to HF (Hugging Face).
In recipes/SOMOS/ttseval/contrastive_sampling.py, I suggest writing docstring examples for RegressionContrastiveEnhancement and ContrastivePairingPipeline.
For ssl.py, docstring examples should be added for BaselineSSLFinetune and TransformerRegression.
In metric_stats.py, docstring examples should be provided for LinearRegressionStats under MetricStats.
There's a duplicative presence of somos_prepare.py. Could you confirm if the second instance is just a soft link?

I recommend @BenoitWang reviewing this PR as well. His insights would be valuable.

Thank you once again for your contribution.

BenoitWang · 2024-02-13T08:37:06Z

Hi @mravanelli @flexthink,

I guess this may not be the last version of the code since we have another branch where we did some latest experiments. I agree that we keep only the best recipe. Some of my observations in my benchmark:

wavlm large gives the best corr for now
l2 loss works better than l1 loss, but sometimes it does not converge if we start with l2, so I start with l1 for 1 epoch and then shift to l2.
I added ljspeech into the training set and first did a ground-truth vs synthesized classification, then fine-tuned this model on the regression task, this helped gain also a little bit. To simplify the recipe we can provide the link of this classification model instead of adding the classification scripts.

As for the code, it seems good to me, the only thing I see here is that I don't find ssl.py very necessary, it wraps some basic speechbrain modules that we can put in the yaml file. What do you think @flexthink ?

BenoitWang · 2024-02-13T08:55:07Z

speechbrain/lobes/models/eval/ssl.py

+        self.pool = StatisticsPooling(return_std=False)
+        self.out_proj = Linear(n_neurons=1, input_size=d_model)
+
+    def forward(self, wav, length):


I think maybe we can just add these modules in the yaml and call them in compute_forward, like this it may seem more clear to the users.

BenoitWang · 2024-02-13T08:57:27Z

speechbrain/lobes/models/eval/ssl.py

+        return x
+
+
+def compute_feats_dim(model):


for the feature dim I think we can declare it in the yaml file for example base=768/large=1024 like we did for other ssl recipes.

BenoitWang · 2024-02-13T09:00:15Z

recipes/SOMOS/ttseval/hparams/train_ssl_hubert_xformer.yaml

+
+d_model: 512
+d_ffn: 2048
+num_layers: 4


we may need to update the best config later for example wavlm large + 3-layer encoder, as well as the lr and the dropout, etc.

…rained model

…ipping

flexthink added 3 commits February 5, 2024 22:04

TTS Eval: Add TTS evaluation (MOS estimation)

4e1ffd7

TTS Eval: Add missing docstrings to pass consistency tests

83b7d4b

TTS Eval: Add a unit test

4e6693b

mravanelli marked this pull request as ready for review February 9, 2024 20:19

mravanelli self-requested a review February 9, 2024 20:19

mravanelli assigned flexthink Feb 9, 2024

mravanelli added the enhancement New feature or request label Feb 9, 2024

mravanelli requested a review from BenoitWang February 11, 2024 22:12

BenoitWang reviewed Feb 13, 2024

View reviewed changes

flexthink added 19 commits February 13, 2024 16:32

TTS Eval: Add a WavLM model

4a53ae0

TTS Eval: Rename

ed3ff41

TTS Eval: Fix WavLM, improve statistics

4ee1569

TTS Eval: Fix WavLM

6d9cd09

TTS Eval: Fix statistics

9fad173

TTS Eval: Add classification pretraining with WavLM

565d41e

TTS Eval: Fix contrastive sampling, add reproducibility

e7e874f

TTS Eval: Remove duplicated somos_prepare.py

d8f779c

TTS Eval: Fixes, clean-up

a2bba8c

TTS Evaluation: Add README

0b1694e

TTS Eval: Update to keep only the best model

5e0c507

TTS Eval: Add recipe tests

ddc7886

TTS Eval: Miscellaneous fixes

98563a3

TTS Eval: SOMOS preparation speech

e8e30f6

TTS Eval: Clean-up

57d9df1

TTS Eval: Update to pass consistency tests (TBD dropbox link)

184235a

TTS Eval: Cosmetic changes (from hooks)

aabe1ea

TTS Eval: Add inference

995152b

TTS Eval: Functionality improvements, add a recipe to evaluate a pret…

363f5b7

…rained model

flexthink added 16 commits March 14, 2024 15:24

TTS Eval: Add support for frozen splits and skipping folder differences

267620c

TTS Eval: Add support for frozen splits and ignoring folders while sk…

9026f6f

…ipping

TTS Eval: Add extra requirements

12b1f60

Merge branch 'develop' into ttseval

d486fd5

TTS Eval: Add support for FastSpeech2

05adb13

TTS Eval: Device fixes

9f433bb

TTS Eval: Fixes

0af174d

TTS Eval: Fixes

964c20a

TTS Eval: Fixes

12551f1

TTS Eval: Fixes

bf5cfcb

TTS Eval: Cosmetic changes

d73f2ca

TTS Eval: Disable LM during evaluation

26d4dcb

TTS Eval: Fixes for model paths

8171684

TTS Eval: Cosmetic changes

aadd8bb

Merge branch 'develop' into ttseval

c9cd33f

TTS Eval: Fix typos

c78dd96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS Eval: Add TTS evaluation (MOS estimation) #2392

TTS Eval: Add TTS evaluation (MOS estimation) #2392

flexthink commented Feb 6, 2024

mravanelli commented Feb 11, 2024

BenoitWang commented Feb 13, 2024

BenoitWang Feb 13, 2024

BenoitWang Feb 13, 2024

BenoitWang Feb 13, 2024

TTS Eval: Add TTS evaluation (MOS estimation) #2392

Are you sure you want to change the base?

TTS Eval: Add TTS evaluation (MOS estimation) #2392

Conversation

flexthink commented Feb 6, 2024

What does this PR do?

PR review

mravanelli commented Feb 11, 2024

BenoitWang commented Feb 13, 2024

BenoitWang Feb 13, 2024

Choose a reason for hiding this comment

BenoitWang Feb 13, 2024

Choose a reason for hiding this comment

BenoitWang Feb 13, 2024

Choose a reason for hiding this comment