Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate TransformerEngine #1098

Open
Quentin-Anthony opened this issue Dec 21, 2023 · 6 comments
Open

Integrate TransformerEngine #1098

Quentin-Anthony opened this issue Dec 21, 2023 · 6 comments
Labels
feature request New feature or request

Comments

@Quentin-Anthony
Copy link
Member

Needed for fp8 training, and adds some nice fp16/bf16 optimizations for Ampere and newer architectures that we can make use of regardless.

https://github.com/EleutherAI/TransformerEngine

@Quentin-Anthony Quentin-Anthony added the feature request New feature or request label Dec 21, 2023
@Quentin-Anthony
Copy link
Member Author

Fairly mature implementation at https://github.com/NVIDIA/Megatron-LM

@mkerin
Copy link
Contributor

mkerin commented Dec 22, 2023

As discussed on Discord - if you need some extra dev manpower then I'll happily take this one off your hands

@StellaAthena
Copy link
Member

As discussed on Discord - if you need some extra dev manpower then I'll happily take this one off your hands

Thank you!

@tf-nv
Copy link
Contributor

tf-nv commented Mar 4, 2024

Hi, I am curious about the state of the efforts and don't see a related branch. I read on discord that FP8 was working but there were struggles with convergence.

@Quentin-Anthony IIUC you spent some time on this as well, could you tell me more? :)

@mkerin
Copy link
Contributor

mkerin commented Mar 7, 2024

Hi - just commenting to say that I’m afraid that I got distracted by other projects, and didn’t make any significant progress on this. Removing myself as assignee as agreed with Quentin so that I don’t block anyone else from picking it up, as I should have done much sooner.

@mkerin mkerin removed their assignment Mar 7, 2024
@tf-nv
Copy link
Contributor

tf-nv commented Mar 13, 2024

There are a few things to unpack here. I had a look at the difference between GPT-NeoX megatron and the upstream megatron, which has a mature implementation as @Quentin-Anthony said. Here's a draft PR with some thoughts on the diff: #1185

It includes a few thoughts on the matter, let's discuss there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants