Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a support to "stop_words" in PIPELINE #160

Open
yynil opened this issue Jul 31, 2023 · 0 comments
Open

Add a support to "stop_words" in PIPELINE #160

yynil opened this issue Jul 31, 2023 · 0 comments

Comments

@yynil
Copy link

yynil commented Jul 31, 2023

Currently, the PIPELINE class in src/util.py has a arg "stop_token" which means the special designed single token_id to stop generation.
But in most cases, the stop_token should be a token id list. For an example, if the prompt looks like :
"User:请根据以下材料设计一道中餐菜谱。要求生成菜名和具体做法,菜谱最后以”完成!“结束。材料:猪后腿肉,青椒,洋葱,盐,胡椒。\nAssistant:菜名:"
The results should looks like below:

红烧猪后腿肉
材料:猪后腿肉,青椒,洋葱,盐,胡椒
做法:
1. 猪后腿肉切成块状,用开水焯水去血水。
2. 热锅凉油,放入洋葱和青椒炒香。
3. 加入猪后腿肉块翻炒至变色。
4. 加入适量的盐和胡椒调味,继续翻炒至熟透。
5. 最后淋上少许生抽即可。
完成!

The stop_token should be set like below:
end_token = pipeline.encode("完成!")

In current implementation, the end_token is not able to stop generation.

I just made an update in my fork to supply the stop_words implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant