You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to reproduce some results on inpainting task and had some concern about the data_parallel mode.
Referring to the codes, batch_size is 4 for single GPU, total pairs of inpainting data is about 2.8m, thus the total log step is 700k.
When I training it on 8-GPUs, the total step still log as 700k, then I've checked the GPU-memory usage -- all the GPU are nearly fully used.
So I just wondering the training batch_size for 8-GPU is 4*8 or not? Or say there are some misalignment in logging?
Thanks for your time.
The text was updated successfully, but these errors were encountered:
Thank you for this question. For multigpu training, the overall batch-size would be num_per_batch * num_batch. The 700k iterations is independent of the batch-size. So, it is needed to manually assign the iterations to match the overall computation cost.
That's definitely an impressive work!
I'm trying to reproduce some results on inpainting task and had some concern about the data_parallel mode.
Referring to the codes, batch_size is 4 for single GPU, total pairs of inpainting data is about 2.8m, thus the total log step is 700k.
When I training it on 8-GPUs, the total step still log as 700k, then I've checked the GPU-memory usage -- all the GPU are nearly fully used.
So I just wondering the training batch_size for 8-GPU is 4*8 or not? Or say there are some misalignment in logging?
Thanks for your time.
The text was updated successfully, but these errors were encountered: