You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Your question
I run pretrain_gpt on same arch, data, training hyperparams and same hardware, with and without using megatron_core when build the model.
I notice clearly worse wall clock time and memory usage:
setting
wall clock time per step(ms)
mem per gpu(GB)
legacy
630
45
use_mcore
690
63
Environment:
hardware
torch version
cuda version
A100-80G-PCIe x 4
2.1.2
12.2
For the data I use c4_en data from huggingface and tokenize it using gpt2 tokenizer. I use the first 3.6e7(first 10%) document to conduct the experiments.
To Reproduce
megatron-lm commit hash: 9de386d
I customize a script from pretrain_gpt_distributed.sh and rename it as pretrain_gpt_cli.sh
Your question
I run pretrain_gpt on same arch, data, training hyperparams and same hardware, with and without using megatron_core when build the model.
I notice clearly worse wall clock time and memory usage:
Environment:
For the data I use c4_en data from huggingface and tokenize it using gpt2 tokenizer. I use the first 3.6e7(first 10%) document to conduct the experiments.
To Reproduce
megatron-lm commit hash: 9de386d
I customize a script from pretrain_gpt_distributed.sh and rename it as
pretrain_gpt_cli.sh
To reproduce the experiment, please run following bash command:
Is there any reason behind this?
The text was updated successfully, but these errors were encountered: