[Bug] VITS gpu utilization #3710

maryawwm · 2024-04-28T05:55:29Z

Describe the bug

im training VITS model (Persian and English language) my dataset is consists of audio clips from 1 to 25s.Im training it on a A100 GPU but most of the time gpu memory is not even half and its utilization is not as i expect.

To Reproduce

i modified my code based on this script in coqui library:

https://github.com/coqui-ai/TTS/blob/dev/recipes/multilingual/vits_tts/train_vits_tts_phonemes.py

and these are the parameters that i set:
audio_config = VitsAudioConfig(
sample_rate=16000,
win_length=1024,
hop_length=256,
num_mels=80,
mel_fmin=0,
mel_fmax=None,
)

vitsArgs = VitsArgs(
use_language_embedding=True,
embedded_language_dim=2,
use_speaker_embedding=True,
use_sdp=False,
)

config = VitsConfig(
model_args=vitsArgs,
audio=audio_config,
run_name="A6_vits_multi_language_10_spk_5_ordibehesht",
use_speaker_embedding=True,
batch_size=48,
eval_batch_size=32,
batch_group_size=128,
num_loader_workers=12,
num_eval_loader_workers=8,
precompute_num_workers=12,
run_eval=True,
test_delay_epochs=-1,
epochs=1000,
text_cleaner="multilingual_cleaners",
use_phonemes=True,
phoneme_language=None,
phonemizer="multi_phonemizer",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
compute_input_seq_cache=True,
print_step=25,
use_language_weighted_sampler=True,
print_eval=False,
mixed_precision=True,
output_path=output_path,
datasets=dataset_config,
cudnn_enable=True,
cudnn_benchmark=True,
cudnn_deterministic=True

Expected behavior

higher gpu utilization and faster training time

Logs

one of my steps log:

[1m   --> TIME: 2024-04-27 09:15:52 -- STEP: 124/3006 -- GLOBAL_STEP: 1750125�[0m
     | > loss_disc: 2.7141058444976807  (2.7415779617524914)
     | > loss_disc_real_0: 0.2915174067020416  (0.22191733380238854)
     | > loss_disc_real_1: 0.2596714198589325  (0.2545961029827594)
     | > loss_disc_real_2: 0.25090914964675903  (0.2519173812601836)
     | > loss_disc_real_3: 0.2509034276008606  (0.2488831561659612)
     | > loss_disc_real_4: 0.2618330121040344  (0.24871416005396074)
     | > loss_disc_real_5: 0.23049794137477875  (0.2413994044726414)
     | > loss_0: 2.7141058444976807  (2.7415779617524914)
     | > grad_norm_0: tensor(2.3359, device='cuda:0')  (tensor(4.0910, device='cuda:0'))
     | > loss_gen: 1.8159717321395874  (1.9762149626208896)
     | > loss_kl: 5.008370399475098  (42.11719334894611)
     | > loss_feat: 1.7703579664230347  (2.0269679972721693)
     | > loss_mel: 30.50223731994629  (41.7430907526324)
     | > loss_duration: 9.647953033447266  (2.5745641668477357)
     | > amp_scaler: 256.0  (509.9354838709682)
     | > loss_1: 48.74489212036133  (90.438032304087)
     | > grad_norm_1: tensor(73.7072, device='cuda:0')  (tensor(215.5241, device='cuda:0'))
     | > current_lr_0: 0.0002 
     | > current_lr_1: 0.0002 
     | > step_time: 5.8922  (3.467874986510123)
     | > loader_time: 0.006  (0.005929248948251048)

Environment

- TTS version : 0.17.8
- python : 3.9.18
- pytorch : 2.1.1
- os : Linux
- gpu : A100

Additional context

No response

maryawwm added the bug Something isn't working label Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] VITS gpu utilization #3710

[Bug] VITS gpu utilization #3710

maryawwm commented Apr 28, 2024

[Bug] VITS gpu utilization #3710

[Bug] VITS gpu utilization #3710

Comments

maryawwm commented Apr 28, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context