Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

两张v100部署失败 #362

Open
Cocoalate opened this issue Aug 16, 2023 · 2 comments
Open

两张v100部署失败 #362

Cocoalate opened this issue Aug 16, 2023 · 2 comments

Comments

@Cocoalate
Copy link

Cocoalate commented Aug 16, 2023

本人环境
两张v100(32G*2)
cuda11.0
pytorch版本 1.7.1

由于pytorch版本比较低,无法支持量化版本,所以选择部署fnlp/moss-moon-003-sft这个模型,但是fp16精度会报以下错
File "/root/anaconda3/envs/mossgpu/lib/python3.8/site-packages/torch/tensor.py", line 547, in __rpow__ return torch.tensor(other, dtype=dtype, device=self.device) ** self RuntimeError: "pow" not implemented for 'Half'
所以只好改成
raw_model = MossForCausalLM._from_config(config, torch_dtype=torch.float32)
运行
python moss_cli_demo.py --model_name fnlp/moss-moon-003-sft --gpu 0,2
报错如下
Traceback (most recent call last): File "moss_cli_demo.py", line 48, in <module> raw_model = MossForCausalLM._from_config(config, torch_dtype=torch.float32) File "/root/anaconda3/envs/mossgpu/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1024, in _from_config model = cls(config, **kwargs) File "/data_a/keke/workspace/MOSS/models/modeling_moss.py", line 607, in __init__ self.transformer = MossModel(config) File "/data_a/keke/workspace/MOSS/models/modeling_moss.py", line 401, in __init__ self.h = nn.ModuleList([MossBlock(config) for _ in range(config.n_layer)]) File "/data_a/keke/workspace/MOSS/models/modeling_moss.py", line 401, in <listcomp> self.h = nn.ModuleList([MossBlock(config) for _ in range(config.n_layer)]) File "/data_a/keke/workspace/MOSS/models/modeling_moss.py", line 256, in __init__ self.mlp = MossMLP(inner_dim, config) File "/data_a/keke/workspace/MOSS/models/modeling_moss.py", line 235, in __init__ self.fc_in = nn.Linear(embed_dim, intermediate_size) File "/root/anaconda3/envs/mossgpu/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 78, in __init__ self.weight = Parameter(torch.Tensor(out_features, in_features)) File "/root/anaconda3/envs/mossgpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 796, in __setattr__ self.register_parameter(name, value) File "/root/anaconda3/envs/mossgpu/lib/python3.8/site-packages/accelerate/big_modeling.py", line 108, in register_empty_parameter module._parameters[name] = param_cls(module._parameters[name].to(device), **kwargs) RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 31.75 GiB total capacity; 30.01 GiB already allocated; 548.00 MiB free; 30.02 GiB reserved in total by PyTorch)

请问有大神知道怎么调么

@lizhixi212
Copy link

RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 31.75 GiB total capacity; 30.01 GiB already allocated; 548.00 MiB free; 30.02 GiB reserved in total by PyTorch)
爆显存了

@Cocoalate
Copy link
Author

RuntimeError: CUDA out of memory. Tried to allocate 576.00 MiB (GPU 0; 31.75 GiB total capacity; 30.01 GiB already allocated; 548.00 MiB free; 30.02 GiB reserved in total by PyTorch) 爆显存了

谢谢 我已经调通了 还是用的fp16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants