We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练qwen1时保持除deepspeed zero2/zero3外其他超参配置不变的情况下,loss差异特别大。请教下之前有做过此类的实验么,这样是否是符合预期的?
No response
The text was updated successfully, but these errors were encountered:
佬,请问这个图是llama factory自带框架出来的图吗?
Sorry, something went wrong.
您好!请问zero3通信成本高吗,我sft llama3-8B 20个steps zero2只要17秒, zero3要20分钟,是不是有点不正常
佬,不是哈
是高不少,特别是多机器时候,性能瓶颈基本就在通信上。但是具体差异多少应该跟机型关系挺大的,我在2块A800上开IB网络时没有差异那么大,大概4-5倍差异
No branches or pull requests
Reminder
Reproduction
训练qwen1时保持除deepspeed zero2/zero3外其他超参配置不变的情况下,loss差异特别大。请教下之前有做过此类的实验么,这样是否是符合预期的?
Expected behavior
No response
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: