You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first of all this is such a great work and a very thorough documentation.
I just would like to ask a simple question. In the documentation of BERT/TF2, it is said that the global batch size is set to 61k (I assume it's rounded) for phase 1, and 30k for phase 2 training. However, if my understanding is correct
global_batch_size = batch_size * num_gpu * num_accumulation_steps
which if I use the described default parameter (60 * 64 * 8 = 30720 for phase 1, and 10 * 192 * 8 for phase 2 = 15360), which is exactly half of the set global batch size. Did I miss something here? or is there really a mistake?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hi, first of all this is such a great work and a very thorough documentation.
I just would like to ask a simple question. In the documentation of BERT/TF2, it is said that the global batch size is set to 61k (I assume it's rounded) for phase 1, and 30k for phase 2 training. However, if my understanding is correct
global_batch_size = batch_size * num_gpu * num_accumulation_steps
which if I use the described default parameter (60 * 64 * 8 = 30720 for phase 1, and 10 * 192 * 8 for phase 2 = 15360), which is exactly half of the set global batch size. Did I miss something here? or is there really a mistake?
Thanks in advance.
The text was updated successfully, but these errors were encountered: