Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generating batch size #592

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

zhuzilin
Copy link
Contributor

The origin implementation was generating samples one by one, which results in low gpu utility for small model. This commit adds a generating-batch-size config to enable batch generation.

Thank you for your time on reviewing this PR :)

The origin implementation was generating samples
one by one, which results in low gpu utility for
small model. This commit adds a generating-batch-size
config to enable batch generation.
@zhuzilin zhuzilin requested a review from a team as a code owner March 18, 2022 05:39
@StellaAthena StellaAthena linked an issue Mar 20, 2022 that may be closed by this pull request
Copy link
Member

@StellaAthena StellaAthena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better solution is to drop the word train from the name of the existing argument and use the existing argument for generation. Having two seems needlessly confusing.

@zhuzilin
Copy link
Contributor Author

I think a better solution is to drop the word train from the name of the existing argument and use the existing argument for generation. Having two seems needlessly confusing.

@StellaAthena I found that the train_batch_size arg relates to the train_micro_batch_size_per_gpu, which has something to do with gradient accumulation. And there is a batch_size arg used solely with LM Evaluation Harness:

self.batch_size = batch_size or (
neox_args.batch_size * self.dp_world_size
) # default batch size to bs per gpu * dp size

Maybe using the batch_size arg is better?

@StellaAthena
Copy link
Member

StellaAthena commented Mar 23, 2022

My suggestion is to rename train_micro_batch_size_per_gpu to micro_batch_size_per_gpu and allow it to fill both the train and the inference roles.

@zhuzilin
Copy link
Contributor Author

@StellaAthena
Just a double check, you mean rename train_batch_size to batch_size, train_micro_batch_size_per_gpu to micro_batch_size_per_gpu and use micro_batch_size_per_gpu for inference/generation, right?

@StellaAthena
Copy link
Member

@StellaAthena Just a double check, you mean rename train_batch_size to batch_size, train_micro_batch_size_per_gpu to micro_batch_size_per_gpu and use micro_batch_size_per_gpu for inference/generation, right?

No, I mean just the second rename. The training batch size and the eval batch size for a fixed microbatch size per gpu are typically different due to gradient accumulation.

@zhuzilin
Copy link
Contributor Author

@StellaAthena Updated.

@zhuzilin
Copy link
Contributor Author

@StellaAthena Hi, could you help me to fix the error raised in CI? Thank you!

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generate.py not utilizing GPU in full
3 participants