Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教下数据集规模 #11

Open
TZWwww opened this issue Jul 20, 2023 · 2 comments
Open

请教下数据集规模 #11

TZWwww opened this issue Jul 20, 2023 · 2 comments

Comments

@TZWwww
Copy link

TZWwww commented Jul 20, 2023

非常感谢您很有意义的工作,想请教一下所使用到的instruction-tuning的数据量。
另外,想再请教一下是否有探究多大的instruction-tuning数据量就够用了呢?
非常感谢

@jerry1993-tech
Copy link
Owner

之前的是QA数据集,大约12M。
「想再请教一下是否有探究多大的instruction-tuning数据量就够用了呢」原则上是数据质量越高越好、数据多样性越多越好,一般一种类型2W+ 即可。

@luxinglong
Copy link

请问12M是disk size,还是instruction数量?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants