-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example for benchmarking ML worloads using Torch Profiler and NSight #322
base: main
Are you sure you want to change the base?
Conversation
…s/awsome-distributed-training into llama2-fsdp-benchmark
## Plenty of EFA level variables | ||
## Comment out for non-efa instances (G4d, P3) | ||
## For G5.12x, Comment out RDMA and Fork safe | ||
## For G4dn and other G5, comment out all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to have sections for clarity. Otherwise it can be confusing as there's a list of variables some of which are commented but only 1 explanation on the top
export FI_PROVIDER=efa | ||
export NCCL_DEBUG=INFO | ||
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1 | ||
#export NCCL_SOCKET_IFNAME=ens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
export FI_PROVIDER=efa | ||
export NCCL_DEBUG=INFO | ||
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1 | ||
#export NCCL_SOCKET_IFNAME=ens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split in sections
export NCCL_DEBUG=INFO | ||
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1 | ||
|
||
#export NCCL_SOCKET_IFNAME=ens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split in sections or remove
conda activate llamapretrain | ||
|
||
# Install pytorch and other dependencies | ||
conda install -y pytorch==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the versions at the top w/ the mamba version
--output /fsx/nsys_profiles2/llama2/report_llama2_job%q{SLURM_JOB_ID}_rank%q{SLURM_PROCID}_on_%q{HOSTNAME}.nsys-rep \ | ||
torchrun "${TORCHRUN_ARGS[@]}" $TRAIN_SCRIPT "${MODEL_ARGS[@]}" | ||
|
||
#srun -u -l "${ENROOT_ARGS[@]}" /usr/local/cuda/bin/nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this command?
## For G4dn and other G5, comment out all | ||
export FI_EFA_USE_DEVICE_RDMA=1 # use for p4d | ||
export FI_EFA_FORK_SAFE=1 | ||
export FI_LOG_LEVEL=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export FI_LOG_LEVEL=1 | |
export FI_LOG_LEVEL=warn |
f4807f0
to
9925232
Compare
9925232
to
fed2da1
Compare
44e448e
to
1209815
Compare
This recipe is built on meta's llama recipe with modifications to allow for model pretraining (LLAMA2) on FSDP with an additional ability to profile the workloads using either Torch Profiler or NSight.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.