Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About multi-gpus training #27

Open
YANDaoyu opened this issue Mar 1, 2024 · 2 comments
Open

About multi-gpus training #27

YANDaoyu opened this issue Mar 1, 2024 · 2 comments

Comments

@YANDaoyu
Copy link

YANDaoyu commented Mar 1, 2024

That's definitely an impressive work!

I'm trying to reproduce some results on inpainting task and had some concern about the data_parallel mode.
Referring to the codes, batch_size is 4 for single GPU, total pairs of inpainting data is about 2.8m, thus the total log step is 700k.
When I training it on 8-GPUs, the total step still log as 700k, then I've checked the GPU-memory usage -- all the GPU are nearly fully used.
So I just wondering the training batch_size for 8-GPU is 4*8 or not? Or say there are some misalignment in logging?

Thanks for your time.

@gallenszl
Copy link

same question. Looking forward to the answer. @canqin001 @shugerdou

@canqin001
Copy link
Contributor

Thank you for this question. For multigpu training, the overall batch-size would be num_per_batch * num_batch. The 700k iterations is independent of the batch-size. So, it is needed to manually assign the iterations to match the overall computation cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants