Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer class BaiChuanTokenizer does not exist or is not currently imported. #11

Open
corlin opened this issue Jun 17, 2023 · 6 comments

Comments

@corlin
Copy link

corlin commented Jun 17, 2023

错误信息如下

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:396 in │
│ │
│ 393 │
│ 394 │
│ 395 if name == 'main': │
│ ❱ 396 │ main() │
│ 397 │
│ │
│ /Users/corlin/code/Efficient-Tuning-LLMs/qlora_finetune.py:312 in main │
│ │
│ 309 │ set_seed(args.seed) │
│ 310 │ │
│ 311 │ # Tokenizer │
│ ❱ 312 │ tokenizer = AutoTokenizer.from_pretrained( │
│ 313 │ │ args.model_name_or_path, │
│ 314 │ │ cache_dir=args.cache_dir, │
│ 315 │ │ padding_side='right', │
│ │
│ /Users/corlin/code/transformers/src/transformers/models/auto/tokenization_auto.py:688 in │
│ from_pretrained │
│ │
│ 685 │ │ │ │ tokenizer_class_candidate = config_tokenizer_class │
│ 686 │ │ │ │ tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate) │
│ 687 │ │ │ if tokenizer_class is None: │
│ ❱ 688 │ │ │ │ raise ValueError( │
│ 689 │ │ │ │ │ f"Tokenizer class {tokenizer_class_candidate} does not exist or is n │
│ 690 │ │ │ │ ) │
│ 691 │ │ │ return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Tokenizer class BaiChuanTokenizer does not exist or is not currently imported.

@corlin
Copy link
Author

corlin commented Jun 17, 2023

macos M1环境

@jianzhnie
Copy link
Owner

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first

@jianzhnie
Copy link
Owner

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/baichuan-7B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/baichuan-7B", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

@corlin
Copy link
Author

corlin commented Jun 19, 2023

You should download the BaiChuanTokenizer and BaiChuan Model Checkpont from the https://huggingface.co/baichuan-inc/baichuan-7B first
image
相关模型目录文件是全的啊。

@jianzhnie
Copy link
Owner

Run folowing example to test the model and tokenizer is well loaded and well inference

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("your_download_model_path", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("your_download_model_path", device_map="auto", trust_remote_code=True)
inputs = tokenizer('登鹳雀楼->王之涣\n夜雨寄北->', return_tensors='pt')
inputs = inputs.to('cuda:0')
pred = model.generate(**inputs, max_new_tokens=64,repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

@RIU-13
Copy link

RIU-13 commented Aug 20, 2023

我没加trust_remote_code会报错,加了就好了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants