Replies: 2 comments
-
Obviously it would depend on the optimizer you use whether and how you want to scale your learning rate with the number of workers (total batch size). For simple SGD that may be the right call, but less so for more involved schemes. It's better to be explicit here in your code. |
Beta Was this translation helpful? Give feedback.
-
Hey @ziqipang, as @maxhgerlach said, this needs to be done manually in most cases. Some higher level frameworks like PyTorch Lightning and Ludwig will automatically scale the learning rate for you, but that is not being done by Horovod, rather it's a feature of those frameworks. Does that answer your question? |
Beta Was this translation helpful? Give feedback.
-
Hi, I am new to horovod and trying to use it for distributed training. I am wondering if we have to manually increase the learning rate when we use more gpus.
On the documentation, the instructions tell us to scale the learning rate manually, and it seems the example training code also do this. But on viewing this issue, I am confused that if horovod will by default scale the learning rate?
Thank you for helping me!
Beta Was this translation helpful? Give feedback.
All reactions