Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming模式和非streaming模式下模型指标差异巨大 #3436

Closed
1 task
zhangbin1997 opened this issue Apr 25, 2024 · 4 comments
Closed
1 task

streaming模式和非streaming模式下模型指标差异巨大 #3436

zhangbin1997 opened this issue Apr 25, 2024 · 4 comments
Labels
solved This problem has been already solved.

Comments

@zhangbin1997
Copy link

zhangbin1997 commented Apr 25, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

如题,streaming模式和非streaming模式下训练出来的模型指标差异巨大,请问这个是正常的吗?
全参数训练,streaming模式下num_worker都是1,非streaming模式下num_worker都是64。
我是把多个来源的数据按照顺序合并成了一个文件,训练时只有这一个数据,streaming模式下模型loss波动非常大,但非streaming模式下模型loss正常收敛。

Expected behavior

No response

System Info

No response

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed. label Apr 25, 2024
@hiyouga
Copy link
Owner

hiyouga commented Apr 25, 2024

混合之后把数据 shuffle 一下再用 streaming 训练

@zhangbin1997
Copy link
Author

但因为我的策略就是需要每个数据集依次排列,所以说没法全局shuffle呢。
请问streaming模式和非streaming模式下本身数据训练的顺序就会明显不一样吗?

@hiyouga
Copy link
Owner

hiyouga commented Apr 25, 2024

非 streaming 模式会 shuffle 整个训练数据集

@merlinarer
Copy link

也就是非 streaming 模式会 shuffle 整个训练数据集,streaming 模式只是在sample data 时候会在buffersize内进行 shuffle?

@hiyouga hiyouga added solved This problem has been already solved. and removed pending This problem is yet to be addressed. labels Apr 29, 2024
@hiyouga hiyouga closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

3 participants