Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bloomz-mt universal checkpoint #20

Open
LiuShixing opened this issue May 24, 2023 · 2 comments
Open

bloomz-mt universal checkpoint #20

LiuShixing opened this issue May 24, 2023 · 2 comments

Comments

@LiuShixing
Copy link

LiuShixing commented May 24, 2023

Hello!
Thanks a lot for your job!
I want to finetune bloomz-mt by your Megatron-DeepSpeed,but I can not find a universal version checkpoint of bloomz-mt or bloomz. I only found the bloom universal checkpoint below.
https://huggingface.co/bigscience/bloom-optimizer-states/tree/global_step95000_universal

With limited GPUs,I have to use TP 4, PP 12 to finetune, but I found that you suggest not to merge TP in below document. So I want to find the bloomz-mt universal checkpoint
https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/finetune.md

@LiuShixing LiuShixing changed the title bloomz-mt bloomz-mt universal checkpoint May 24, 2023
@Muennighoff
Copy link
Collaborator

Muennighoff commented May 25, 2023

I didn't create universal ckpts for them unfortunately. The options I see are:

  • Try to get resources to train for 1 step so you can convert to universal ckpt (Note that as PP=72 & TP=1 and you can drop the DP states, you need at least 72 GPUs (probably need some DP to fit it though); You can set LR=0 for that step)
  • Figure out how to reshape the checkpoints / convert them to universal ckpts (I would hope that DeepSpeed has some code for this by now, but maybe not)
  • Restart from bloom (The largest models were finetuned for 500 steps so it's not too much effort)

@LiuShixing
Copy link
Author

LiuShixing commented May 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants