Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7B版本无法多卡运行 #303

Open
ybshaw opened this issue May 6, 2024 · 1 comment
Open

7B版本无法多卡运行 #303

ybshaw opened this issue May 6, 2024 · 1 comment
Assignees

Comments

@ybshaw
Copy link

ybshaw commented May 6, 2024

使用官方提供的7B版本,单卡24G内存的RTX上无法运行,报OOM错误,指定卡号后无法生效,依然还是只占用第0卡,要怎么推理才可以正常运行

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)
ckpt_path='/home/my/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b'


# init model and tokenizer
model = AutoModel.from_pretrained(ckpt_path, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)

text = '<ImageHere>仔细描述这张图'
image='/home/my/cat.jpg'
with torch.cuda.amp.autocast():
	response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
print(response)

报错:OOM错误
tmp

代码中指定所有卡号(机器信息:4卡,每张24G内存)

import os

os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3'

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)
ckpt_path='/home/my/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b'


# init model and tokenizer
model = AutoModel.from_pretrained(ckpt_path, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)

text = '<ImageHere>仔细描述这张图'
image='/home/my/cat.jpg'
with torch.cuda.amp.autocast():
	response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)
print(response)

还是一样的错误,查看nvidia-smi发现实际还是跑在一张卡上,没有分布到其余卡上

@XueFengHF
Copy link

同样的问题,4卡3090,example只能单卡,finetune单卡爆显存,多卡报错ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 2 (pid: 15250) of binary: /opt/conda/envs/internlm/bin/python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants