Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for benchmarking ML worloads using Torch Profiler and NSight #322

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

syedazi
Copy link
Collaborator

@syedazi syedazi commented May 10, 2024

This recipe is built on meta's llama recipe with modifications to allow for model pretraining (LLAMA2) on FSDP with an additional ability to profile the workloads using either Torch Profiler or NSight.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@syedazi syedazi self-assigned this May 10, 2024
## Plenty of EFA level variables
## Comment out for non-efa instances (G4d, P3)
## For G5.12x, Comment out RDMA and Fork safe
## For G4dn and other G5, comment out all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to have sections for clarity. Otherwise it can be confusing as there's a list of variables some of which are commented but only 1 explanation on the top

export FI_PROVIDER=efa
export NCCL_DEBUG=INFO
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
#export NCCL_SOCKET_IFNAME=ens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

export FI_PROVIDER=efa
export NCCL_DEBUG=INFO
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
#export NCCL_SOCKET_IFNAME=ens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split in sections

export NCCL_DEBUG=INFO
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1

#export NCCL_SOCKET_IFNAME=ens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split in sections or remove

conda activate llamapretrain

# Install pytorch and other dependencies
conda install -y pytorch==2.3.0 pytorch-cuda=11.8 -c pytorch -c nvidia
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the versions at the top w/ the mamba version

--output /fsx/nsys_profiles2/llama2/report_llama2_job%q{SLURM_JOB_ID}_rank%q{SLURM_PROCID}_on_%q{HOSTNAME}.nsys-rep \
torchrun "${TORCHRUN_ARGS[@]}" $TRAIN_SCRIPT "${MODEL_ARGS[@]}"

#srun -u -l "${ENROOT_ARGS[@]}" /usr/local/cuda/bin/nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this command?

## For G4dn and other G5, comment out all
export FI_EFA_USE_DEVICE_RDMA=1 # use for p4d
export FI_EFA_FORK_SAFE=1
export FI_LOG_LEVEL=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export FI_LOG_LEVEL=1
export FI_LOG_LEVEL=warn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants