Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Megatron-VLM training #806

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Support for Megatron-VLM training #806

wants to merge 10 commits into from

Conversation

1049451037
Copy link

@1049451037 1049451037 commented May 5, 2024

In this pull request, we open source our solution for visual-language model training and inference in pure Megatron style code. In this codebase, we support:

  1. Megatron ViT model, and its model weight converter.
  2. Uneven split of pipeline parallel when the first pipeline has ViT. We find it speed up training with a large margin.
  3. Sequence parallel and context parallel support for VLM training (for both ViT & LM), which is non-trivial when we need to promise the ViT of all ranks receiving gradients. (Since sp and cp split sequence, some of ranks only contains text tokens.)
  4. Detached pp size for ViT and GPT. (Since megatron use a global mpu for all models.)
  5. Multi-modal inference code.

The running example is in examples/llava folder.

Hope that our work can contribute to the open source community. If there are any questions, welcome feedback!

@jon-barker
Copy link
Collaborator

Hi. Thanks for creating this PR. We (NVIDIA) are actually planning to release VLM training functionality in Megatron core in the next couple of weeks. As you may have seen, we've been pushing out some preparatory code to support this. Our initial example release is going to be pretraining and SFT for a llava architecture model using llama3 and clip backbones and a general multimodal webdataset based dataloader. We're reviewing your PR internally to see if we can incorporate any of your work alongside ours and will be sure to credit you as such if we do.

Thanks again!

@1049451037
Copy link
Author

Thank you for your attention! Looking forward to the official implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants