Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size mismatch, Error(s) in loading model finetuned by lora #1129

Open
2 tasks done
GuYith opened this issue Mar 9, 2024 · 6 comments
Open
2 tasks done

Size mismatch, Error(s) in loading model finetuned by lora #1129

GuYith opened this issue Mar 9, 2024 · 6 comments

Comments

@GuYith
Copy link

GuYith commented Mar 9, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

I finetune the QWen-7B, and when I use the finetuned model, I met some error:

root@1fc7d6985d8b:/Fine/Qwen-main# python3 cli_demo.py
/usr/local/lib/python3.8/dist-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00,  1.30s/it]
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 151851. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
Traceback (most recent call last):
  File "cli_demo.py", line 217, in <module>
    main()
  File "cli_demo.py", line 123, in main
    model, tokenizer, config = _load_model_tokenizer(args)
  File "cli_demo.py", line 60, in _load_model_tokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/peft/auto.py", line 128, in from_pretrained
    return cls._target_peft_class.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 353, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 697, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/usr/local/lib/python3.8/dist-packages/peft/utils/save_and_load.py", line 249, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151851, 4096]).
        size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151851, 4096]).

I finetuned the model by use:

bash finetune_lora_single_gpu.sh -d my_train_data.json

and I modified my cli-demo.py follow the tutorial:

    model = AutoPeftModelForCausalLM.from_pretrained(
        model_path, # path to the output directory or model name
        device_map=device_map,
        trust_remote_code=True,
    ).eval()

I found some probably related issues like #419 #482 , but they can't solve my problem.

期望行为 | Expected Behavior

demo runs normally.

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: 22.04.1-Ubuntu
- Python: 3.8
- Transformers:4.32.0
- PyTorch:2.2.1
- CUDA 12.1

备注 | Anything else?

No response

@jklj077
Copy link
Contributor

jklj077 commented Mar 11, 2024

Something seems wrong with the vocab_size (which is the size of the embedding, not the actual vocabulary size) in config.json and the pad_to_multiple_of setting.

Please try upgrade transformers<4.38.0 and downgrade peft<0.8.0 first and provide the content of config.json.

@GuYith
Copy link
Author

GuYith commented Mar 11, 2024

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0.
image

config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

@jklj077
Copy link
Contributor

jklj077 commented Mar 11, 2024

There should be an adapter_config.json as well. Let's see what's there. I think peft is changing the vocab_size somewhere.

Copy link

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决,请在此帖下方留言以补充信息。

@Aurora-slz
Copy link

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0. image

config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Hi, have you solved this problem?

@GuYith
Copy link
Author

GuYith commented Apr 25, 2024

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0. image
config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Hi, have you solved this problem?

I'm sorry I didn't follow up on this issue.

@github-actions github-actions bot removed the inactive label May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants