You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i have 4 3090 gpu and have install the enviroment as instructions.
i used modelscope to download the pretrained weight , but it‘s wrong to run the example/example_chat.py
the code as following:
import sys
sys.path.insert(0, '.')
sys.path.insert(0, '..')
import argparse
import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
from examples.utils import auto_configure_device_map
torch.set_grad_enabled(False)
parser = argparse.ArgumentParser()
parser.add_argument("--num_gpus", default=4, type=int)
parser.add_argument("--dtype", default='fp16', type=str)
args = parser.parse_args()
# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).eval()
if args.dtype == 'fp16':
model.half().cuda()
elif args.dtype == 'fp32':
model.cuda()
if args.num_gpus > 1:
from accelerate import dispatch_model
device_map = auto_configure_device_map(args.num_gpus)
model = dispatch_model(model, device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast():
with torch.no_grad():
response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
print(response)
# The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regret,"
# set against a backdrop of a breathtaking sunset. The sky is painted in hues of pink and orange,
# creating a serene atmosphere. Two silhouetted figures stand on a cliff, overlooking the horizon.
# They appear to be hiking or exploring, embodying the essence of the quote.
# The overall scene conveys a sense of adventure and freedom, encouraging viewers to embrace life without hesitation or regrets.
the error can be listed as following:
(intern_clean) fusionai@Dev:~/ztf/InternLM-XComposer2-4KHD/code_download/InternLM-XComposer$ CUDA_VISIBLE_DEVICES=2,3,4,5 python examples/example_chat_modelscope.py --num_gpus 2
2024-04-11 15:22:32,902 - modelscope - INFO - PyTorch version 2.0.1+cu117 Found.
2024-04-11 15:22:32,903 - modelscope - INFO - Loading ast index from /home/fusionai/.cache/modelscope/ast_indexer
2024-04-11 15:22:32,999 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 3382d34e525970486512b31f23987eb2 and a total number of 972 components indexed
2024-04-11 15:22:33,632 - modelscope - WARNING - Model revision not specified, use revision: v1.0.0
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
Set max length to 4096
Some weights of the model checkpoint at /home/fusionai/.cache/modelscope/hub/AI-ModelScope/clip-vit-large-patch14-336 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.embeddings.position_ids', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_projection.weight', 'text_model.final_layer_norm.bias', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'logit_scale', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'visual_projection.weight', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.final_layer_norm.weight', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.embeddings.position_embedding.weight']
- This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Position interpolate from 24x24 to 35x35
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.99s/it]
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at /home/fusionai/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b and are newly initialized: ['vit.vision_tower.vision_model.embeddings.position_ids', 'vit.vision_tower.vision_model.post_layernorm.bias', 'vit.vision_tower.vision_model.post_layernorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
warnings.warn(
Traceback (most recent call last):
File "/home/fusionai/ztf/InternLM-XComposer2-4KHD/code_download/InternLM-XComposer/examples/example_chat_modelscope.py", line 36, in <module>
response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 511, in chat
outputs = self.generate(
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/generation/utils.py", line 1522, in generate
return self.greedy_search(
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
outputs = self(
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 366, in forward
outputs = self.model(
File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm2.py", line 884, in forward
attention_mask = self._prepare_decoder_attention_mask(
File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm2.py", line 820, in _prepare_decoder_attention_mask
expanded_attn_mask + combined_attention_mask)
RuntimeError: The size of tensor a (1377) must match the size of tensor b (1376) at non-singleton dimension 3
how to fixed this problem?? wait in anxious, help~~
The text was updated successfully, but these errors were encountered:
The size of tensor a (1377) must match the size of tensor b (1376) at non-singleton dimension 3
this is commonly caused by the transformer version difference, try transformers==4.33.2
The size of tensor a (1377) must match the size of tensor b (1376) at non-singleton dimension 3 this is commonly caused by the transformer version difference, try transformers==4.33.2
I tested with transformers==4.33.2 but it failed with the same error when the per_device_train_batch_size > 1. Any suggestions?
i have 4 3090 gpu and have install the enviroment as instructions.
i used modelscope to download the pretrained weight , but it‘s wrong to run the
example/example_chat.py
the code as following:
the error can be listed as following:
how to fixed this problem?? wait in anxious, help~~
The text was updated successfully, but these errors were encountered: