Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about multi-gpu example? #258

Open
ztfmars opened this issue Apr 11, 2024 · 3 comments
Open

question about multi-gpu example? #258

ztfmars opened this issue Apr 11, 2024 · 3 comments
Assignees

Comments

@ztfmars
Copy link

ztfmars commented Apr 11, 2024

i have 4 3090 gpu and have install the enviroment as instructions.
i used modelscope to download the pretrained weight , but it‘s wrong to run the example/example_chat.py

the code as following:

import sys
sys.path.insert(0, '.')
sys.path.insert(0, '..')
import argparse
import torch

from modelscope import snapshot_download, AutoModel, AutoTokenizer
from examples.utils import auto_configure_device_map

torch.set_grad_enabled(False)

parser = argparse.ArgumentParser()
parser.add_argument("--num_gpus", default=4, type=int)
parser.add_argument("--dtype", default='fp16', type=str)
args = parser.parse_args()

# init model and tokenizer
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b')
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).eval()
if args.dtype == 'fp16':
    model.half().cuda()
elif args.dtype == 'fp32':
    model.cuda()

if args.num_gpus > 1:
    from accelerate import dispatch_model
    device_map = auto_configure_device_map(args.num_gpus)
    model = dispatch_model(model, device_map=device_map)

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast():
    with torch.no_grad():
        response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
print(response)
# The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regret,"
# set against a backdrop of a breathtaking sunset. The sky is painted in hues of pink and orange,
# creating a serene atmosphere. Two silhouetted figures stand on a cliff, overlooking the horizon.
# They appear to be hiking or exploring, embodying the essence of the quote.
# The overall scene conveys a sense of adventure and freedom, encouraging viewers to embrace life without hesitation or regrets.

the error can be listed as following:

(intern_clean) fusionai@Dev:~/ztf/InternLM-XComposer2-4KHD/code_download/InternLM-XComposer$ CUDA_VISIBLE_DEVICES=2,3,4,5 python examples/example_chat_modelscope.py --num_gpus 2
2024-04-11 15:22:32,902 - modelscope - INFO - PyTorch version 2.0.1+cu117 Found.
2024-04-11 15:22:32,903 - modelscope - INFO - Loading ast index from /home/fusionai/.cache/modelscope/ast_indexer
2024-04-11 15:22:32,999 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 3382d34e525970486512b31f23987eb2 and a total number of 972 components indexed
2024-04-11 15:22:33,632 - modelscope - WARNING - Model revision not specified, use revision: v1.0.0
You are using a model of type internlmxcomposer2 to instantiate a model of type internlm. This is not supported for all configurations of models and can yield errors.
Set max length to 4096
Some weights of the model checkpoint at /home/fusionai/.cache/modelscope/hub/AI-ModelScope/clip-vit-large-patch14-336 were not used when initializing CLIPVisionModel: ['text_model.encoder.layers.7.layer_norm1.bias', 'text_model.encoder.layers.7.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.weight', 'text_model.encoder.layers.0.layer_norm1.weight', 'text_model.encoder.layers.6.self_attn.q_proj.weight', 'text_model.encoder.layers.7.self_attn.k_proj.bias', 'text_model.encoder.layers.0.self_attn.out_proj.bias', 'text_model.encoder.layers.2.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.q_proj.bias', 'text_model.encoder.layers.11.layer_norm2.weight', 'text_model.encoder.layers.0.layer_norm1.bias', 'text_model.encoder.layers.6.self_attn.v_proj.weight', 'text_model.encoder.layers.5.layer_norm1.weight', 'text_model.encoder.layers.6.self_attn.k_proj.weight', 'text_model.encoder.layers.2.layer_norm1.weight', 'text_model.encoder.layers.9.mlp.fc1.weight', 'text_model.encoder.layers.11.mlp.fc1.weight', 'text_model.encoder.layers.0.self_attn.q_proj.bias', 'text_model.encoder.layers.0.self_attn.k_proj.weight', 'text_model.encoder.layers.7.self_attn.q_proj.weight', 'text_model.encoder.layers.2.self_attn.v_proj.weight', 'text_model.encoder.layers.2.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.bias', 'text_model.encoder.layers.4.self_attn.q_proj.bias', 'text_model.encoder.layers.11.layer_norm1.bias', 'text_model.encoder.layers.0.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.bias', 'text_model.encoder.layers.11.self_attn.q_proj.weight', 'text_model.encoder.layers.1.self_attn.k_proj.bias', 'text_model.encoder.layers.11.mlp.fc1.bias', 'text_model.encoder.layers.2.layer_norm1.bias', 'text_model.encoder.layers.3.layer_norm2.bias', 'text_model.encoder.layers.11.layer_norm2.bias', 'text_model.encoder.layers.9.layer_norm1.bias', 'text_model.encoder.layers.1.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.k_proj.bias', 'text_model.encoder.layers.5.self_attn.v_proj.bias', 'text_model.encoder.layers.8.layer_norm1.weight', 'text_model.encoder.layers.6.mlp.fc2.bias', 'text_model.encoder.layers.10.layer_norm1.bias', 'text_model.encoder.layers.2.mlp.fc1.bias', 'text_model.encoder.layers.10.self_attn.v_proj.bias', 'text_model.encoder.layers.5.mlp.fc1.weight', 'text_model.encoder.layers.1.layer_norm1.bias', 'text_model.encoder.layers.9.self_attn.k_proj.bias', 'text_model.encoder.layers.6.layer_norm1.weight', 'text_model.encoder.layers.11.mlp.fc2.bias', 'text_model.encoder.layers.11.mlp.fc2.weight', 'text_model.encoder.layers.7.layer_norm2.weight', 'text_model.embeddings.position_ids', 'text_model.encoder.layers.3.mlp.fc1.bias', 'text_model.encoder.layers.1.layer_norm2.bias', 'text_model.encoder.layers.2.mlp.fc2.weight', 'text_model.encoder.layers.2.self_attn.q_proj.bias', 'text_model.encoder.layers.0.mlp.fc1.bias', 'text_model.encoder.layers.1.self_attn.v_proj.bias', 'text_model.encoder.layers.3.mlp.fc2.weight', 'text_model.encoder.layers.2.mlp.fc1.weight', 'text_model.encoder.layers.7.layer_norm2.bias', 'text_projection.weight', 'text_model.final_layer_norm.bias', 'text_model.encoder.layers.0.mlp.fc2.bias', 'text_model.encoder.layers.7.layer_norm1.weight', 'text_model.encoder.layers.0.self_attn.v_proj.bias', 'text_model.encoder.layers.10.self_attn.q_proj.bias', 'text_model.encoder.layers.10.self_attn.v_proj.weight', 'text_model.encoder.layers.4.mlp.fc1.bias', 'text_model.encoder.layers.11.layer_norm1.weight', 'text_model.encoder.layers.10.layer_norm2.bias', 'text_model.encoder.layers.9.self_attn.out_proj.weight', 'text_model.encoder.layers.5.self_attn.k_proj.weight', 'text_model.encoder.layers.10.mlp.fc2.weight', 'text_model.encoder.layers.8.mlp.fc1.bias', 'text_model.encoder.layers.4.layer_norm1.weight', 'text_model.encoder.layers.4.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.out_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.bias', 'text_model.encoder.layers.7.mlp.fc2.bias', 'text_model.encoder.layers.3.self_attn.k_proj.weight', 'text_model.encoder.layers.1.layer_norm2.weight', 'text_model.encoder.layers.10.self_attn.k_proj.bias', 'text_model.encoder.layers.2.layer_norm2.weight', 'text_model.encoder.layers.4.layer_norm1.bias', 'text_model.encoder.layers.0.layer_norm2.weight', 'text_model.encoder.layers.9.layer_norm1.weight', 'text_model.encoder.layers.5.mlp.fc2.bias', 'text_model.encoder.layers.7.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.out_proj.bias', 'text_model.encoder.layers.0.mlp.fc2.weight', 'text_model.encoder.layers.4.self_attn.q_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.weight', 'text_model.encoder.layers.8.mlp.fc2.weight', 'text_model.encoder.layers.10.mlp.fc1.bias', 'text_model.encoder.layers.5.mlp.fc1.bias', 'text_model.encoder.layers.8.self_attn.q_proj.bias', 'text_model.encoder.layers.1.mlp.fc1.weight', 'text_model.encoder.layers.6.layer_norm2.bias', 'text_model.encoder.layers.7.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.v_proj.bias', 'text_model.encoder.layers.7.mlp.fc1.bias', 'text_model.encoder.layers.11.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.q_proj.bias', 'text_model.encoder.layers.11.self_attn.out_proj.weight', 'text_model.encoder.layers.0.self_attn.k_proj.bias', 'text_model.encoder.layers.1.mlp.fc1.bias', 'text_model.encoder.layers.4.self_attn.out_proj.bias', 'text_model.embeddings.token_embedding.weight', 'text_model.encoder.layers.3.layer_norm2.weight', 'text_model.encoder.layers.10.self_attn.out_proj.weight', 'text_model.encoder.layers.9.self_attn.q_proj.weight', 'text_model.encoder.layers.10.self_attn.q_proj.weight', 'logit_scale', 'text_model.encoder.layers.4.self_attn.v_proj.bias', 'text_model.encoder.layers.2.self_attn.k_proj.bias', 'text_model.encoder.layers.9.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.q_proj.weight', 'text_model.encoder.layers.5.self_attn.v_proj.weight', 'text_model.encoder.layers.0.self_attn.q_proj.weight', 'text_model.encoder.layers.5.layer_norm2.weight', 'text_model.encoder.layers.7.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc1.weight', 'text_model.encoder.layers.8.mlp.fc2.bias', 'text_model.encoder.layers.4.self_attn.v_proj.weight', 'text_model.encoder.layers.8.self_attn.v_proj.weight', 'text_model.encoder.layers.7.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc2.weight', 'text_model.encoder.layers.9.mlp.fc2.bias', 'text_model.encoder.layers.5.layer_norm2.bias', 'text_model.encoder.layers.6.layer_norm1.bias', 'text_model.encoder.layers.4.mlp.fc2.weight', 'text_model.encoder.layers.3.self_attn.q_proj.weight', 'visual_projection.weight', 'text_model.encoder.layers.11.self_attn.v_proj.weight', 'text_model.encoder.layers.6.mlp.fc1.weight', 'text_model.encoder.layers.8.layer_norm1.bias', 'text_model.encoder.layers.8.self_attn.k_proj.weight', 'text_model.encoder.layers.9.layer_norm2.bias', 'text_model.encoder.layers.2.self_attn.v_proj.bias', 'text_model.encoder.layers.5.self_attn.q_proj.weight', 'text_model.encoder.layers.10.layer_norm2.weight', 'text_model.encoder.layers.5.self_attn.k_proj.bias', 'text_model.encoder.layers.3.self_attn.q_proj.bias', 'text_model.encoder.layers.1.mlp.fc2.bias', 'text_model.encoder.layers.6.self_attn.k_proj.bias', 'text_model.encoder.layers.7.self_attn.v_proj.bias', 'text_model.encoder.layers.9.self_attn.q_proj.bias', 'text_model.encoder.layers.6.self_attn.out_proj.weight', 'text_model.encoder.layers.0.mlp.fc1.weight', 'text_model.encoder.layers.10.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.k_proj.bias', 'text_model.encoder.layers.10.self_attn.out_proj.bias', 'text_model.encoder.layers.3.layer_norm1.bias', 'text_model.encoder.layers.3.self_attn.v_proj.weight', 'text_model.encoder.layers.9.self_attn.v_proj.bias', 'text_model.encoder.layers.1.self_attn.out_proj.weight', 'text_model.encoder.layers.3.self_attn.out_proj.bias', 'text_model.encoder.layers.7.self_attn.out_proj.weight', 'text_model.encoder.layers.8.self_attn.out_proj.bias', 'text_model.encoder.layers.6.self_attn.q_proj.bias', 'text_model.encoder.layers.4.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.out_proj.weight', 'text_model.encoder.layers.2.layer_norm2.bias', 'text_model.encoder.layers.4.self_attn.k_proj.weight', 'text_model.encoder.layers.11.self_attn.q_proj.bias', 'text_model.encoder.layers.5.mlp.fc2.weight', 'text_model.encoder.layers.4.mlp.fc1.weight', 'text_model.encoder.layers.3.mlp.fc1.weight', 'text_model.encoder.layers.1.mlp.fc2.weight', 'text_model.encoder.layers.8.mlp.fc1.weight', 'text_model.encoder.layers.10.mlp.fc2.bias', 'text_model.encoder.layers.4.layer_norm2.weight', 'text_model.encoder.layers.4.mlp.fc2.bias', 'text_model.encoder.layers.2.self_attn.k_proj.weight', 'text_model.encoder.layers.1.self_attn.out_proj.bias', 'text_model.final_layer_norm.weight', 'text_model.encoder.layers.9.layer_norm2.weight', 'text_model.encoder.layers.6.self_attn.out_proj.bias', 'text_model.encoder.layers.0.layer_norm2.bias', 'text_model.encoder.layers.5.self_attn.out_proj.bias', 'text_model.encoder.layers.6.mlp.fc1.bias', 'text_model.encoder.layers.11.self_attn.k_proj.weight', 'text_model.encoder.layers.2.self_attn.q_proj.weight', 'text_model.encoder.layers.1.layer_norm1.weight', 'text_model.encoder.layers.1.self_attn.k_proj.weight', 'text_model.encoder.layers.10.layer_norm1.weight', 'text_model.encoder.layers.9.mlp.fc2.weight', 'text_model.encoder.layers.7.self_attn.q_proj.bias', 'text_model.encoder.layers.6.layer_norm2.weight', 'text_model.encoder.layers.8.layer_norm2.weight', 'text_model.encoder.layers.9.mlp.fc1.bias', 'text_model.encoder.layers.9.self_attn.k_proj.weight', 'text_model.encoder.layers.8.self_attn.k_proj.bias', 'text_model.encoder.layers.8.layer_norm2.bias', 'text_model.encoder.layers.1.self_attn.v_proj.weight', 'text_model.encoder.layers.0.self_attn.out_proj.weight', 'text_model.encoder.layers.5.layer_norm1.bias', 'text_model.encoder.layers.3.mlp.fc2.bias', 'text_model.encoder.layers.3.layer_norm1.weight', 'text_model.encoder.layers.6.self_attn.v_proj.bias', 'text_model.embeddings.position_embedding.weight']
- This IS expected if you are initializing CLIPVisionModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPVisionModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Position interpolate from 24x24 to 35x35
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.99s/it]
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at /home/fusionai/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b and are newly initialized: ['vit.vision_tower.vision_model.embeddings.position_ids', 'vit.vision_tower.vision_model.post_layernorm.bias', 'vit.vision_tower.vision_model.post_layernorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/generation/utils.py:1259: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Traceback (most recent call last):
  File "/home/fusionai/ztf/InternLM-XComposer2-4KHD/code_download/InternLM-XComposer/examples/example_chat_modelscope.py", line 36, in <module>
    response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 511, in chat
    outputs = self.generate(
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/generation/utils.py", line 1522, in generate
    return self.greedy_search(
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/generation/utils.py", line 2339, in greedy_search
    outputs = self(
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm_xcomposer2.py", line 366, in forward
    outputs = self.model(
  File "/home/fusionai/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm2.py", line 884, in forward
    attention_mask = self._prepare_decoder_attention_mask(
  File "/home/fusionai/.cache/huggingface/modules/transformers_modules/internlm-xcomposer2-vl-7b/modeling_internlm2.py", line 820, in _prepare_decoder_attention_mask
    expanded_attn_mask + combined_attention_mask)
RuntimeError: The size of tensor a (1377) must match the size of tensor b (1376) at non-singleton dimension 3

how to fixed this problem?? wait in anxious, help~~

@LightDXY
Copy link
Collaborator

The size of tensor a (1377) must match the size of tensor b (1376) at non-singleton dimension 3
this is commonly caused by the transformer version difference, try  transformers==4.33.2

@ztfmars
Copy link
Author

ztfmars commented Apr 11, 2024

it works! thx, the chinese readme may need fixed on the version.

https://github.com/InternLM/InternLM-XComposer/blob/main/docs/install_CN.md

@fiona-lxd
Copy link

The size of tensor a (1377) must match the size of tensor b (1376) at non-singleton dimension 3 this is commonly caused by the transformer version difference, try  transformers==4.33.2

I tested with transformers==4.33.2 but it failed with the same error when the per_device_train_batch_size > 1. Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants