Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phi3 mini model add new token #411

Open
NickyDark1 opened this issue May 1, 2024 · 4 comments
Open

phi3 mini model add new token #411

NickyDark1 opened this issue May 1, 2024 · 4 comments

Comments

@NickyDark1
Copy link

Is it possible to add new token and special tokens to be trained?
What would the code be like?

@NickyDark1
Copy link
Author

NickyDark1 commented May 2, 2024

example token special:

"32005": {

  | "content": "<|function_call|>",
  | "lstrip": false,
  | "normalized": false,
  | "rstrip": true,
  | "single_word": false,
  | "special": true
  | },

https://huggingface.co/NickyNicky/Phi-3-mini-128k-instruct_function/blob/main/tokenizer_config.json

@danielhanchen
Copy link
Contributor

Yes - I haven't announced it yet, but you can use:

from unsloth import add_new_tokens
add_new_tokens(model, tokenizer, new_tokens = ["<SPECIAL_TOKEN_1>", "<SPECIAL_TOKEN_2>")

Do this before get_peft_model

@NickyDark1
Copy link
Author

similar?
Would it make a difference to add the normal tokens and the special ones?

special_tokens_dict = {'additional_special_tokens': ['[C1]','[C2]','[C3]','[C4]']}
num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)
model.resize_token_embeddings(len(tokenizer))

@danielhanchen
Copy link
Contributor

oh theyre all special tokens! just use add_new_tokens for all of them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants