How to load a pre-trained model using horovod #3257
Unanswered
ForawardStar
asked this question in
Q&A
Replies: 1 comment
-
You could restore the checkpoint on rank 0, then broadcast the variables to the other workers, similarly to what you would do after initialization. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Environment:
Question
I can successfully train my model using horovod in a multi-machine-multi-card fashion, and save the checkpoint models only on worker 0 to prevent other workers from corrupting them. My question is how to load such pre-trained models using horovod in a multi-machine-multi-card fashion, Do I need to load the pre-trained models only on worker 0 ?
Beta Was this translation helpful? Give feedback.
All reactions