Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In gpt-2/convert-h5-to-ggml.py : size mismatch for wpe.weight ... torch.Size([50255, 1024]) ... #745

Open
Twenkid opened this issue Feb 23, 2024 · 0 comments

Comments

@Twenkid
Copy link

Twenkid commented Feb 23, 2024

Hi, first: thank you and admirations for your great work and dedication! I'm trying to convert an old h5 GPT2 model to ggml (a GPT2-Medium, trained on Colab/Tesla T4 with tf on Bulgarian texts in 2021) in order to "replay" it a bit, inspired by the impressive speed of your library.

However I face an error of wrong shapes during the conversion (first I add @ line 73: from_tf=True as another error suggests).

I haven't tried with another instance of a gpt2 model, in case if it is possible something to be wrong with mine as it was not finetuned, but instantiated from scratch (it was a small growing dataset, maybe about 140 MB at the maximum).

The Bulgarian model can be downloaded from here: https://mega.nz/folder/0NpXwbhQ#8mid7QKtsjVxj2a6dP5d8Q

The same error appears both locally and on Google Colab. (I noticed an issue about something related, but it was closed.)

  • EDIT: This one: convert-h5-to-ggml.py does not match the official convert-ckpt-to-ggml.py #72

  • EDIT 2 27.2.2024: Comparing the numbers, I realized it could be because I've created it with a slightly shorter vocabulary of 50255 intstead of 50257 ... h shape torch.Size([50255, 1024]) and I see somebody in another issue has created one with a size of 50259, again causing problems "gpt2_model_load: n_vocab = 50259"...: gpt2 error #371
    Could that be the problem or actually the vocabulary size shouldn't matter? (Because on the other hand the mismatch is [1024,1024]. I tried to edit the vocabulary files, changed to 50257, added two more tokens etc., but now there was another mismatch, 51461120 = 50255*1024 | 50257 | 1024 | 51463168, I guess the "vocab_size": in the config.json, if I revert it to 50257, the previous error returns.

I guess a solution would be to open it with tf, create another proper GPT2 instance with the right vocab.size, copy the appropriate weights at the lower level and save. Or maybe do this on the fly.

I tried the second, I managed to pad an initial tensor 50255,1024 to 50257,1024, while preserving 50255 in model.config. Then the reading of the tf model passes, but it seems it fails again when it starts with the conversion to pt, although now with the proper dimension of 50257.

tf.function
def eager_f(symbolic_weight):
  print("PAD????", symbolic_weight.shape[0]*symbolic_weight.shape[1])
  paddings = tf.constant([[0, 2], [0,0]])  #add 2 after dim 0
  symbolic_weight = tf.pad(symbolic_weight, paddings,  "constant", 0) 
  print(symbolic_weight.shape)
  return symbolic_weight

In:

(...)
modeling_tf_utils.py

def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
 #(...)
 if saved_weight_value is not None:
                        print("saved_weight_value=",saved_weight_value)
                        print(saved_weight_value.shape)
                        # Check if the shape of the current weight and the one from the H5 file are different
                        print("SAVED_WEIGHT")
                        print(saved_weight_value)
                        print(saved_weight_value.shape)
                        if saved_weight_value.shape[0] == 50255:
                           saved_weight_value = eager_f(saved_weight_value)
                           print("AFTER PADDING SAVED_WEIGHT:")
                           print(saved_weight_value)
                           print(saved_weight_value.shape)
                           ss = input("Press a key...")
(...)
 K.int_shape(symbolic_weight)= (1024,)
Traceback (most recent call last):
  File "/home/tosh/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 80, in <module>
    model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True) #from_tf
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3469, in from_pretrained
    model, loading_info = load_tf2_checkpoint_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 468, in load_tf2_checkpoint_in_pytorch_model
    return load_tf2_model_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 477, in load_tf2_model_in_pytorch_model
    return load_tf2_weights_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 495, in load_tf2_weights_in_pytorch_model
    return load_tf2_state_dict_in_pytorch_model(
  File "/home/tosh/.local/lib/python3.10/site-packages/transformers/modeling_tf_pytorch_utils.py", line 565, in load_tf2_state_dict_in_pytorch_model
    missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
  File "/home/tosh/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
        size mismatch for wpe.weight: copying a param with shape torch.Size([50257, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).

.../transformers/modeling_tf_utils.py

def load_tf_weights_from_h5(model, resolved_archive_file, ignore_mismatched_sizes=False, _prefix=None):
    mismatched_layers = []

    # Read the H5 file
    with h5py.File(resolved_archive_file, "r") as sharded_checkpoint_file:
        # Retrieve the name of each layer from the H5 file
        saved_h5_model_layers_name = set(load_attributes_from_hdf5_group(sharded_checkpoint_file, "layer_names"))
 ...

<method-wrapper '__repr__' of TFGPT2MainLayer object at 0x7fee7e878460>
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wte/embeddings:0' shape=(50255, 1024) dtype=float32, numpy=
array([[ 0.00544963, -0.01376201,  0.00010876, ..., -0.03386341,
         0.00794204,  0.02500119],    
       ...,
         0.01859283,  0.01723549]], dtype=float32)>
(50255, 1024)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/wpe/embeddings:0' shape=(1024, 1024) dtype=float32, numpy=
array([[ 0.02799516,  0.02006585, -0.0060562 , ...,  0.00939397,
      ...
         0.00648996, -0.0052477 ]], dtype=float32)>
(1024, 1024)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/gamma:0' shape=(1024,) dtype=float32, numpy=array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/ln_1/beta:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)>
(1024,)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/weight:0' shape=(1024, 3072) dtype=float32, 
...
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_attn/bias:0' shape=(1, 3072) dtype=float32, numpy=array([[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)>
(1, 3072)
SYMBOLIC_WEIGHT:  <tf.Variable 'tfgpt2_model/transformer/h_._0/attn/c_proj/weight:0' shape=(1024, 1024) dtype=float32, numpy=

(.....)

...

The initial errors from Colab and local:

2024-02-23 09:31:49.259251: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-23 09:31:49.259329: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-23 09:31:49.261800: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-23 09:31:51.094485: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-23 09:31:54.027130: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205844480 exceeds 10% of free system memory.
2024-02-23 09:31:55.341677: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205844480 exceeds 10% of free system memory.
2024-02-23 09:31:55.623982: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 205844480 exceeds 10% of free system memory.
2024-02-23 09:31:56.140302: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 16777216 exceeds 10% of free system memory.
2024-02-23 09:31:56.229922: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 16777216 exceeds 10% of free system memory.
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2025: UserWarning: for wte.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Traceback (most recent call last):
  File "/content/ggml/examples/gpt-2/convert-h5-to-ggml.py", line 73, in <module>
    model = GPT2Model.from_pretrained(dir_model, low_cpu_mem_usage=True, from_tf=True)
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3817, in from_pretrained
    model, loading_info = load_tf2_checkpoint_in_pytorch_model(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 469, in load_tf2_checkpoint_in_pytorch_model
    return load_tf2_model_in_pytorch_model(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 478, in load_tf2_model_in_pytorch_model
    return load_tf2_weights_in_pytorch_model(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 496, in load_tf2_weights_in_pytorch_model
    return load_tf2_state_dict_in_pytorch_model(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_pytorch_utils.py", line 566, in load_tf2_state_dict_in_pytorch_model
    missing_keys, unexpected_keys = pt_model.load_state_dict(new_pt_params_dict, strict=False)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPT2Model:
	size mismatch for wpe.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.0.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.0.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.0.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.0.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.0.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.0.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.0.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.0.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.0.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.0.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.0.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.0.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.1.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.1.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.1.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.1.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.1.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.1.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.1.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.1.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.1.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.1.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.1.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.1.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.2.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.2.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.2.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.2.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.2.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.2.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.2.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.2.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.2.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.2.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.2.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.2.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.3.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.3.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.3.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.3.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.3.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.3.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.3.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.3.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.3.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.3.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.3.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.3.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.4.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.4.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.4.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.4.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.4.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.4.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.4.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.4.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.4.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.4.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.4.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.4.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.5.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.5.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.5.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.5.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.5.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.5.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.5.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.5.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.5.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.5.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.5.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.5.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.6.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.6.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.6.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.6.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.6.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.6.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.6.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.6.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.6.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.6.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.6.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.6.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.7.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.7.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.7.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.7.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.7.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.7.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.7.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.7.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.7.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.7.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.7.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.7.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.8.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.8.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.8.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.8.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.8.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.8.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.8.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.8.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.8.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
	size mismatch for h.8.mlp.c_fc.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096]).
	size mismatch for h.8.mlp.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([4096, 1024]).
	size mismatch for h.8.mlp.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.9.ln_1.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.9.ln_1.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.9.attn.c_attn.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 3072]).
	size mismatch for h.9.attn.c_attn.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([3072]).
	size mismatch for h.9.attn.c_proj.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
	size mismatch for h.9.attn.c_proj.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.9.ln_2.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.9.ln_2.bias: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for h.9.mlp.c_fc.weight: copying a param with shape torch.Size([50255, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant