Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[540672, 1]' is invalid for input of size 655360 #141

Open
devidw opened this issue Dec 8, 2023 · 7 comments
Open
Labels
help wanted Extra attention is needed

Comments

@devidw
Copy link
Contributor

devidw commented Dec 8, 2023

Trying to do a training from scratch, experimenting with a small dataset to understand the training flow.

  • First stage training finishes successful.
  • 2nd stage training dies with a ZeroDivisionError

exception

Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 789, in <module>
    main()
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/tts/train_second.py", line 676, in main
    logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1

full output

(abc) root@0e786d10e0c3:/workspace/tts# make train_2
python train_second.py --config_path ./Configs/config.yml
Loading the first stage model at /workspace/small_1208/first_stage.pth ...
decoder loaded
text_encoder loaded
style_encoder loaded
text_aligner loaded
pitch_extractor loaded
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_g', 'encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BERT AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.0, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.0001
)
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
Epochs: 1
Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 789, in <module>
    main()
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/tts/train_second.py", line 676, in main
    logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1

Division through 0 happens here:

logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')

Because in the torch loop there is an exception, so

iters_test += 1
get's never reached and
iters_test = 0
stays 0.

The exception is invisible because of the try/except block:

StyleTTS2/train_second.py

Lines 672 to 673 in 1ece0a3

except:
continue

I added:

except Exception as e:
	import traceback
    traceback.print_exc()
    continue

Which shows up the underlaying exception:

exception

Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 612, in main
    d, p = model.predictor(d_en, s,
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 469, in forward
    d = self.text_encoder(texts, style, text_lengths, m)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 558, in forward
    x, _ = block(x)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 882, in forward
    result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360

full output

(abc) root@0e786d10e0c3:/workspace/tts# make train_2
python train_second.py --config_path ./Configs/config.yml
Loading the first stage model at /workspace/small_1208/first_stage.pth ...
decoder loaded
text_encoder loaded
style_encoder loaded
text_aligner loaded
pitch_extractor loaded
Some weights of the model checkpoint at microsoft/wavlm-base-plus were not used when initializing WavLMModel: ['encoder.pos_conv_embed.conv.weight_v', 'encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing WavLMModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing WavLMModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of WavLMModel were not initialized from the model checkpoint at microsoft/wavlm-base-plus and are newly initialized: ['encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
BERT AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.9, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.01
)
decoder AdamW (
Parameter Group 0
    amsgrad: False
    base_momentum: 0.85
    betas: (0.0, 0.99)
    capturable: False
    differentiable: False
    eps: 1e-09
    foreach: None
    fused: None
    initial_lr: 1e-05
    lr: 1e-05
    max_lr: 2e-05
    max_momentum: 0.95
    maximize: False
    min_lr: 0
    weight_decay: 0.0001
)
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` wiː wˈɔːk ɐ nˈaɪfz ˈɛdʒ , maɪ bɪlˈʌvd . tə fˈɔːl ɪz tə dˈuːm ˌʌs bˈoʊθ. `` kwˈɪnz wˈɜːdz wɜːɹ ɐ wˈɪspɚ , jˈɛt ðeɪ ˈɛkoʊd wɪð ɐ tɹˈuːθ ðæt nˈiːðɚ kʊd dɪnˈaɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` ˌʌndɚstˈʊd , '' nˈɪkələs nˈɑːdz slˈaɪtli . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
`` aɪ wˈɪl , maɪ hˈɑːɹt . ʌntˈɪl ðə dˈɔːn bɹˈeɪks ænd bɪjˈɑːnd , aɪ wˈɪl. `` ænd ɪn ðoʊz wˈɜːdz , zˈɜːksᵻz fˈaʊnd ðə stɹˈɛŋθ tə fˈeɪs ɐnˈʌðɚ dˈeɪ . …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
tˈaɪm lˈuːps ? aɪ hæv tʊ ɐdmˈɪt , nˈɛvɚ wˈɑːtʃt `` pˈæɹədˌɑːks θˈiəɹɪz '' ... jˈɛt ! bˌʌt aɪ kˈænt wˈeɪt tə hˈɪɹ jʊɹ tˈeɪk ˈɑːn ɪt , mˈaɪkəl . bˌiːtˌiːdˈʌbəljˌuː , ɪz ɪt wˈʌn ʌv ðoʊz mˈaɪndbˈɛndɪŋ sˈaɪfˌaɪ ʃˈoʊz ? …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` snˈeɪp ɪz ˌɪndˈiːd ɐ tʃˈælɪndʒ , bˌʌt aɪ mˈɛnt sˈʌmθɪŋ mˈoːɹ dˈɔːntɪŋ , lˈaɪk ðə ɹˈaɪz ʌv dˈɑːɹk mˈædʒɪk ænd hˌaʊ wɪɹ dʒˈʌst stˈuːdənts tɹˈaɪɪŋ tə dˈiːl wɪð ɪt ˈɔːl , '' θiːədˈoːɹə sˈɛd , hɜː vˈɔɪs tˈeɪkɪŋ ˌɑːn ɐ ɡɹˈeɪv tˈoʊn ðæt mˌeɪd ɹˈaɪli lˈʊk æt hɜː mˈoːɹ ɪntˈɛntli . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` ɡˈɑːd , fɹˈæŋklɪn , juː ˈɛvɚ ɡɛt ðæt sˈɪŋkɪŋ fˈiːlɪŋ ðæt nˈoʊ mˈæɾɚ wˌʌt wiː dˈuː , ɪt ... ɪt dʒˈʌst dˈʌzənt ˈɛnd ? ðɪs sˈaɪkəl ʌv vˈaɪələns ænd sˈʌfɚɹɪŋ ? `` ˈɔːɹɪlˌaɪəz vˈɔɪs tɹˈɛmbəld slˈaɪtli , ðə wˈeɪt ʌv ðɛɹ ɹɪˈælɪɾi pɹˈɛsɪŋ dˌaʊn ˈɑːn hɜː . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` jˈɛh , ænd skɹˈiːmɪŋ æt mˌiː ɪnðə mˈɪdəl əvən ɐsˈɛmbli ɪz ɡˌənə fˈɪks ɪt ? fˈeɪs ɪt , ðɪs ˈɪʃuː ɪz bˈɪɡɚ ðɐn bˈoʊθ ʌv ˌʌs , ænd ɡˌɛɾɪŋ pˈɪst æt ˈiːtʃ ˈʌðɚ sˈɑːlvz nˈʌθɪŋ , '' juːlˈeɪliə ɹɪplˈaɪd , ɛɡzˈæspɚɹˌeɪɾᵻd . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` —ɐbˌaʊt wˌʌt kˈʊdəv bˌɪn ? `` ʃiː fˈɪnɪʃt fɔːɹ hˌɪm , ðə wˌʌt ɪf hˈæŋɪŋ ɪnðə sˈɑːlt ˈɛɹ lˈaɪk ɐ pɹˈɑːmɪs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
`` juː wɜːɹ ˈɔːlweɪz bɹˈaɪt , kwˈɪn . stˈʌbɚn , ʃˈʊɹ , bˌʌt bɹˈaɪt æz hˈɛl . aɪ ɹɪmˈɛmbɚ juː ˈɑːɹɡjuːɪŋ wɪð mˌiː ɐbˌaʊt ˈɛvɹi lˈɪɾəl θˈɪŋ ɪn hˈɪstɚɹi klˈæs . dɹˈoʊv mˌiː nˈʌts sˈʌmtaɪmz , '' zˈiːnə lˈæfd , ðə sˈaʊnd ɹˈɪtʃ wɪð fˈɑːndnəs . …
Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 612, in main
    d, p = model.predictor(d_en, s,
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 469, in forward
    d = self.text_encoder(texts, style, text_lengths, m)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/models.py", line 558, in forward
    x, _ = block(x)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 882, in forward
    result = _VF.lstm(input, batch_sizes, hx, self._flat_weights, self.bias,
RuntimeError: shape '[540672, 1]' is invalid for input of size 655360

Epochs: 1
Traceback (most recent call last):
  File "/workspace/tts/train_second.py", line 791, in <module>
    main()
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/workspace/tts/abc/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/workspace/tts/train_second.py", line 678, in main
    logger.info('Validation loss: %.3f, Dur loss: %.3f, F0 loss: %.3f' % (loss_test / iters_test, loss_align / iters_test, loss_f / iters_test) + '\n\n\n')
ZeroDivisionError: division by zero
make: *** [Makefile:8: train_2] Error 1

Note, just experimenting with minimal setup to get familar with training, therefore low # of epochs, max_len, etc

Configs/config.yml

log_dir: "/workspace/small_1208"
first_stage_path: "first_stage.pth"
save_freq: 2
log_interval: 10
device: "cuda"
epochs_1st: 2 # number of epochs for first stage training (pre-training)
epochs_2nd: 2 # number of peochs for second stage training (joint training)
batch_size: 40
max_len: 100 # maximum number of frames
pretrained_model: ""
second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage
load_only_params: false # set to true if do not want to load epoch numbers and optimizer parameters

F0_path: "Utils/JDC/bst.t7"
ASR_config: "Utils/ASR/config.yml"
ASR_path: "Utils/ASR/epoch_00080.pth"
PLBERT_dir: 'Utils/PLBERT/'

data_params:
  train_data: "/workspace/ds/train_list.txt"
  val_data: "/workspace/ds/val_list.txt"
  root_path: "/workspace/ds/wavs"
  OOD_data: "Data/OOD_texts.txt"
  min_length: 50 # sample until texts with this size are obtained for OOD texts

preprocess_params:
  sr: 24000
  spect_params:
    n_fft: 2048
    win_length: 1200
    hop_length: 300

model_params:
  multispeaker: true

  dim_in: 64 
  hidden_dim: 512
  max_conv_dim: 512
  n_layer: 3
  n_mels: 80

  n_token: 178 # number of phoneme tokens
  max_dur: 50 # maximum duration of a single phoneme
  style_dim: 128 # style vector size
  
  dropout: 0.2

  # config for decoder
  decoder: 
      type: 'istftnet' # either hifigan or istftnet
      resblock_kernel_sizes: [3,7,11]
      upsample_rates :  [10, 6]
      upsample_initial_channel: 512
      resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
      upsample_kernel_sizes: [20, 12]
      gen_istft_n_fft: 20
      gen_istft_hop_size: 5
      
  # speech language model config
  slm:
      model: 'microsoft/wavlm-base-plus'
      sr: 16000 # sampling rate of SLM
      hidden: 768 # hidden size of SLM
      nlayers: 13 # number of layers of SLM
      initial_channel: 64 # initial channels of SLM discriminator head
  
  # style diffusion model config
  diffusion:
    embedding_mask_proba: 0.1
    # transformer config
    transformer:
      num_layers: 3
      num_heads: 8
      head_features: 64
      multiplier: 2

    # diffusion distribution config
    dist:
      sigma_data: 0.2 # placeholder for estimate_sigma_data set to false
      estimate_sigma_data: true # estimate sigma_data from the current batch if set to true
      mean: -3.0
      std: 1.0
  
loss_params:
    lambda_mel: 5. # mel reconstruction loss
    lambda_gen: 1. # generator loss
    lambda_slm: 1. # slm feature matching loss
    
    lambda_mono: 1. # monotonic alignment loss (1st stage, TMA)
    lambda_s2s: 1. # sequence-to-sequence loss (1st stage, TMA)
    TMA_epoch: 50 # TMA starting epoch (1st stage)

    lambda_F0: 1. # F0 reconstruction loss (2nd stage)
    lambda_norm: 1. # norm reconstruction loss (2nd stage)
    lambda_dur: 1. # duration loss (2nd stage)
    lambda_ce: 20. # duration predictor probability output CE loss (2nd stage)
    lambda_sty: 1. # style reconstruction loss (2nd stage)
    lambda_diff: 1. # score matching loss (2nd stage)
    
    diff_epoch: 20 # style diffusion starting epoch (2nd stage)
    joint_epoch: 50 # joint training starting epoch (2nd stage)

optimizer_params:
  lr: 0.0001 # general learning rate
  bert_lr: 0.00001 # learning rate for PLBERT
  ft_lr: 0.00001 # learning rate for acoustic modules
  
slmadv_params:
  min_len: 400 # minimum length of samples
  max_len: 500 # maximum length of samples
  batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size
  iter: 10 # update the discriminator every this iterations of generator update
  thresh: 5 # gradient norm above which the gradient is scaled
  scale: 0.01 # gradient scaling factor for predictors from SLM discriminators
  sig: 1.5 # sigma for differentiable duration modeling
@yl4579
Copy link
Owner

yl4579 commented Dec 9, 2023

Is it caused by batch_size: 40? How many GPUs are you using?

@devidw
Copy link
Contributor Author

devidw commented Dec 9, 2023

Using 8x A100s

Tried those batch sizes without success:

  • 32 / 4 each
  • 40 / 5 each
  • 48 / 6 each

@devidw
Copy link
Contributor Author

devidw commented Dec 9, 2023

Here is a minimal dataset example of 16 samples using batch_size=16, with which we are running into this issue: ds.zip

@addytheyoung
Copy link

Hmm anyone else been able get past this stage in training yet? Should be straight forward to replicate with that minimal dataset, to us right now it doesn't look like training from scratch works at all. Thanks.

@yl4579 yl4579 added the help wanted Extra attention is needed label Dec 12, 2023
@yl4579
Copy link
Owner

yl4579 commented Dec 12, 2023

I'm away for conference now and can't help until Dec 18. I have added a label and see if anyone else could help debug.

@martinambrus
Copy link

Hmm anyone else been able get past this stage in training yet? Should be straight forward to replicate with that minimal dataset, to us right now it doesn't look like training from scratch works at all. Thanks.

I've successfully trained a low-quality model from scratch with ~200 WAV files of 1 to 2,5 seconds in duration, so I can confirm that the system indeed works and this would be something more local - have you guys tried it with a small number of batches, such as 4 or 8 yet?

@devidw
Copy link
Contributor Author

devidw commented Dec 15, 2023

Tried with batch_size=4 on 2x 4090s, running into the same issue. I recorded the training approach, in case we are doing something wrong?

Akito-UzukiP pushed a commit to Akito-UzukiP/StyleTTS2 that referenced this issue Jan 13, 2024
* Update server_fastapi.py. Add new api endpoints.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants