Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[!] train_step() retuned None outputs. Skipping training step. #3637

Open
daufilataf opened this issue Mar 18, 2024 · 1 comment
Open

[!] train_step() retuned None outputs. Skipping training step. #3637

daufilataf opened this issue Mar 18, 2024 · 1 comment
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.

Comments

@daufilataf
Copy link

Describe the bug

When I start to train with modified data, I got this error:

--> TIME: 2024-03-18 13:31:18 -- STEP: 0/496 -- GLOBAL_STEP: 0
| > current_lr: 2.5e-07
| > step_time: 0.9393 (0.9392588138580322)
| > loader_time: 0.4176 (0.4175543785095215)

[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.

Then it continues in normal way:

--> TIME: 2024-03-18 13:31:34 -- STEP: 25/496 -- GLOBAL_STEP: 25
| > loss: 3.6332433223724365 (3.534811576207479)
| > log_mle: 0.632981538772583 (0.6383022785186767)
| > loss_dur: 3.0002617835998535 (2.896509297688802)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7261, device='cuda:0') (tensor(9.8709, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.2053 (0.21109932899475098)
| > loader_time: 0.4624 (1.924059352874756)

--> TIME: 2024-03-18 13:31:52 -- STEP: 50/496 -- GLOBAL_STEP: 50
| > loss: 3.623731851577759 (3.5723715841770174)
| > log_mle: 0.6547501683235168 (0.6443059176206588)
| > loss_dur: 2.9689817428588867 (2.928065669536591)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7175, device='cuda:0') (tensor(10.3839, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.1739 (0.20701711177825927)
| > loader_time: 0.5336 (1.2170554876327515)

........

To Reproduce

EPOCH: 0/100
--> TTS/recipes/ljspeech/our_new_tts/run-March-18-2024_01+31PM-dbf1a08a

DataLoader initialization
| > Tokenizer:
| > add_blank: False
| > use_eos_bos: False
| > use_phonemes: False
| > Number of instances : 15852
| > Preprocessing samples
| > Max text length: 169
| > Min text length: 21
| > Avg text length: 76.67335352006056
|
| > Max audio length: 447821
| > Min audio length: 28224
| > Avg audio length: 124906.38638657583
| > Num. instances discarded samples: 0
| > Batch group size: 0.

TRAINING (2024-03-18 13:31:17)
i̇nanırıq ki, belə də olacaqdır
[!] Character '̇' not found in the vocabulary. Discarding it.

--> TIME: 2024-03-18 13:31:18 -- STEP: 0/496 -- GLOBAL_STEP: 0
| > current_lr: 2.5e-07
| > step_time: 0.9393 (0.9392588138580322)
| > loader_time: 0.4176 (0.4175543785095215)

[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.
[!] train_step() retuned None outputs. Skipping training step.

--> TIME: 2024-03-18 13:31:34 -- STEP: 25/496 -- GLOBAL_STEP: 25
| > loss: 3.6332433223724365 (3.534811576207479)
| > log_mle: 0.632981538772583 (0.6383022785186767)
| > loss_dur: 3.0002617835998535 (2.896509297688802)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7261, device='cuda:0') (tensor(9.8709, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.2053 (0.21109932899475098)
| > loader_time: 0.4624 (1.924059352874756)

--> TIME: 2024-03-18 13:31:52 -- STEP: 50/496 -- GLOBAL_STEP: 50
| > loss: 3.623731851577759 (3.5723715841770174)
| > log_mle: 0.6547501683235168 (0.6443059176206588)
| > loss_dur: 2.9689817428588867 (2.928065669536591)
| > amp_scaler: 16384.0 (16384.0)
| > grad_norm: tensor(10.7175, device='cuda:0') (tensor(10.3839, device='cuda:0'))
| > current_lr: 2.5e-07
| > step_time: 0.1739 (0.20701711177825927)
| > loader_time: 0.5336 (1.2170554876327515)

Expected behavior

Expected behavior is that, it should not return:
[!] train_step() retuned None outputs. Skipping training step.

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA RTX A5000"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.2.1+cu121",
        "numpy": "1.24.3"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.11.5",
        "version": "#110~20.04.1-Ubuntu SMP Tue Feb 13 14:25:03 UTC 2024"
    }
}

Additional context

No response

@daufilataf daufilataf added the bug Something isn't working label Mar 18, 2024
Copy link

stale bot commented Apr 22, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests

1 participant