Replies: 3 comments 3 replies
-
Hi @Labmem009 , 根据测试,7B和20B均可以使用8张A100成功进行100k长度上下文的微调,具体配置如下:
|
Beta Was this translation helpful? Give feedback.
2 replies
-
感谢您的解答!但我还想请问一下推理和微调时显存的消耗是和上文是近似成线性关系吗?如果我使用平均长度8k,最长120k的上文进行微调,是否会损害模型200k的长上文能力? |
Beta Was this translation helpful? Give feedback.
0 replies
-
应该不会,现在的200k更多是依靠外推进行扩展的,所以不包含200k的语料并不会让模型变得更差。 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the question.
我看到文档中Xtuner里有配套的微调组件,但如果想要做长上文的指令微调,比如100k以上,请问大概的显存需求是多少?需要使用什么训练策略?
Beta Was this translation helpful? Give feedback.
All reactions