Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于chatglm2与chatglm数据格式的问题 #104

Open
Kayce001 opened this issue Sep 23, 2023 · 2 comments
Open

关于chatglm2与chatglm数据格式的问题 #104

Kayce001 opened this issue Sep 23, 2023 · 2 comments

Comments

@Kayce001
Copy link

input_ids = [tokenizer.get_command("[gMASK]"),
tokenizer.get_command("sop")] + tokenizer.convert_tokens_to_ids(tokens)请问这行是什么意思,为什么和chatglm版本差别挺大的,为什么可以以现在这种格式写呢?

@zengzhongjie
Copy link

我也有这个疑问,按这个格式,我们试用效果很差

@liucongg
Copy link
Owner

liucongg commented Jan 7, 2024

因为chatglm2和chatglm官方在训练的时候,用的数据格式就不同。PS:两个模型的结构模型也大不相同。一个是prefix-lm一个是causal-lm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants