Size mismatch, Error(s) in loading model finetuned by lora #1129

GuYith · 2024-03-09T14:28:01Z

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

I finetune the QWen-7B, and when I use the finetuned model, I met some error:

root@1fc7d6985d8b:/Fine/Qwen-main# python3 cli_demo.py
/usr/local/lib/python3.8/dist-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:10<00:00,  1.30s/it]
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 151851. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
Traceback (most recent call last):
  File "cli_demo.py", line 217, in <module>
    main()
  File "cli_demo.py", line 123, in main
    model, tokenizer, config = _load_model_tokenizer(args)
  File "cli_demo.py", line 60, in _load_model_tokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/peft/auto.py", line 128, in from_pretrained
    return cls._target_peft_class.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 353, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 697, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/usr/local/lib/python3.8/dist-packages/peft/utils/save_and_load.py", line 249, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151851, 4096]).
        size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151851, 4096]).

I finetuned the model by use:

bash finetune_lora_single_gpu.sh -d my_train_data.json

and I modified my cli-demo.py follow the tutorial:

    model = AutoPeftModelForCausalLM.from_pretrained(
        model_path, # path to the output directory or model name
        device_map=device_map,
        trust_remote_code=True,
    ).eval()

I found some probably related issues like #419 #482 , but they can't solve my problem.

期望行为 | Expected Behavior

demo runs normally.

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: 22.04.1-Ubuntu
- Python: 3.8
- Transformers:4.32.0
- PyTorch:2.2.1
- CUDA 12.1

备注 | Anything else?

No response

The text was updated successfully, but these errors were encountered:

jklj077 · 2024-03-11T04:52:50Z

Something seems wrong with the vocab_size (which is the size of the embedding, not the actual vocabulary size) in config.json and the pad_to_multiple_of setting.

Please try upgrade transformers<4.38.0 and downgrade peft<0.8.0 first and provide the content of config.json.

GuYith · 2024-03-11T05:20:57Z

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0.

config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

jklj077 · 2024-03-11T07:29:26Z

There should be an adapter_config.json as well. Let's see what's there. I think peft is changing the vocab_size somewhere.

github-actions · 2024-04-20T08:05:24Z

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决，请在此帖下方留言以补充信息。

Aurora-slz · 2024-04-25T02:51:03Z

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0.

config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Hi, have you solved this problem?

GuYith · 2024-04-25T06:13:52Z

Okay, I upgrade transformers to 4.38.0, and use peft=0.7.0 now, but meet some errors which I haven't met when I use peft=0.9.0.
config.json is here, I copied it from https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/config.json

{
  "architectures": [
    "QWenLMHeadModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },
  "attn_dropout_prob": 0.0,
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 4096,
  "intermediate_size": 22016,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 32768,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.32.0",
  "use_cache": true,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}

Hi, have you solved this problem?

I'm sorry I didn't follow up on this issue.

github-actions bot added the inactive label Apr 20, 2024

github-actions bot removed the inactive label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size mismatch, Error(s) in loading model finetuned by lora #1129

Size mismatch, Error(s) in loading model finetuned by lora #1129

GuYith commented Mar 9, 2024

jklj077 commented Mar 11, 2024

GuYith commented Mar 11, 2024

jklj077 commented Mar 11, 2024

github-actions bot commented Apr 20, 2024

Aurora-slz commented Apr 25, 2024

GuYith commented Apr 25, 2024

Size mismatch, Error(s) in loading model finetuned by lora #1129

Size mismatch, Error(s) in loading model finetuned by lora #1129

Comments

GuYith commented Mar 9, 2024

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

jklj077 commented Mar 11, 2024

GuYith commented Mar 11, 2024

jklj077 commented Mar 11, 2024

github-actions bot commented Apr 20, 2024

Aurora-slz commented Apr 25, 2024

GuYith commented Apr 25, 2024