Can't fine-tune a model on my dataset in Google Colab #116

yukiarimo · 2024-04-28T18:42:53Z

🐛 Describe the bug

I ran the following code:

git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
pip install -e .
python -m unidic download
cd melo
python preprocess_text.py --metadata all.list
bash train.sh config.json 1

My all.list example:

wavs/29.wav|EN-default|EN|Well, she looks exactly like the one I read about in the book, except she isn't violent at all. Hahaha.
wavs/15.wav|EN-default|EN|It's kind of a rare monster, it's incredibly ferocious!

Log after running the train.sh:

...
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
2024-04-28 17:58:31.390776: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-28 17:58:31.390837: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-28 17:58:31.392462: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-28 17:58:32.734241: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-04-28 17:58:33.501 | INFO     | data_utils:_filter:64 - Init dataset...
100%|█████████████████| 96/96 [00:00<00:00, 21500.06it/s]
2024-04-28 17:58:33.507 | INFO     | data_utils:_filter:84 - min: 1870; max: 1871
2024-04-28 17:58:33.507 | INFO     | data_utils:_filter:85 - skipped: 9, total: 96
Bucket warning  
buckets: []
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 8, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
2024-04-28 17:58:33.508 | INFO     | data_utils:_filter:64 - Init dataset...
100%|███████████████████| 4/4 [00:00<00:00, 20763.88it/s]
2024-04-28 17:58:33.509 | INFO     | data_utils:_filter:84 - min: 1870; max: 1871
2024-04-28 17:58:33.509 | INFO     | data_utils:_filter:85 - skipped: 0, total: 4
Using noise scaled MAS for VITS2
Using duration discriminator for VITS2
(torch.Size([10, 192]), torch.Size([8, 192]))
(torch.Size([256, 256]), torch.Size([1, 256]))
list index out of range
0it [00:00, ?it/s]/usr/lib/python3.10/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  self.pid = os.fork()
0it [00:00, ?it/s]
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py:138: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
...

How to fix this? Am I doing something wrong?

Versions

Collecting environment information...

Model name: Intel(R) Xeon(R) CPU @ 2.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 3
BogoMIPS: 4000.35
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB (4 instances)
L1i cache: 128 KiB (4 instances)
L2 cache: 4 MiB (4 instances)
L3 cache: 38.5 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; PTE Inversion
Vulnerability Mds: Vulnerable; SMT Host state unknown
Vulnerability Meltdown: Vulnerable
Vulnerability Mmio stale data: Vulnerable
Vulnerability Retbleed: Vulnerable
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Vulnerable

Versions of relevant libraries:
[pip3] numpy==1.25.2
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchdata==0.7.1
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.17.1
[pip3] torchvision==0.17.1+cu121
[pip3] triton==2.2.0
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

28065467 · 2024-05-19T17:12:00Z

Same problem but in Ubuntu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't fine-tune a model on my dataset in Google Colab #116

Can't fine-tune a model on my dataset in Google Colab #116

yukiarimo commented Apr 28, 2024

28065467 commented May 19, 2024

Can't fine-tune a model on my dataset in Google Colab #116

Can't fine-tune a model on my dataset in Google Colab #116

Comments

yukiarimo commented Apr 28, 2024

🐛 Describe the bug

Versions

28065467 commented May 19, 2024