-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama training with FP8 #331
base: main
Are you sure you want to change the base?
Conversation
87c6612
to
cb07958
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks a great feature addition to 10.FSDP. Do you think TE support can be added in the test case instead of creating new one?
@@ -0,0 +1,2 @@ | |||
checkpoints | |||
slurm-*.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slurm-*.out
should already be excluded by https://github.com/aws-samples/awsome-distributed-training/blob/main/.gitignore
@@ -0,0 +1,183 @@ | |||
# Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
# | |||
# See LICENSE for license information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to refer original code https://github.com/NVIDIA/TransformerEngine/blob/16a469df6bbc77e1c32e48e8e5fd3082dbc2d18e/docs/examples/te_llama/te_llama.py
@KeitaW thanks for the review! I was thinking about adding FP8 support to FSDP example, but there are two aspects why I decided to create a separate example for this:
So, in terms of importance this example is about LLama with FP8. FSDP training here is just kind of scaffolding. |
@pbelevich FYI the AWS DLC for PyTorch also includes TE |
cb07958
to
82b3e89
Compare
82b3e89
to
663b344
Compare
44e448e
to
1209815
Compare
No description provided.