Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问训练结束还要准备什么 #50

Open
fangyifei222 opened this issue Mar 8, 2024 · 17 comments
Open

请问训练结束还要准备什么 #50

fangyifei222 opened this issue Mar 8, 2024 · 17 comments

Comments

@fangyifei222
Copy link

我对13B模型lora微调之后,只获得了如下的文件,我将.pt文件转成了.bin模型,但是其他的configuration_baichuan.py,generation_config.json, modeling_baichuan.py都使用原来的好像不可以,请问是要针对微调后的模型自行修改吗
19e21271b356f708777d004a3ad33a9

@ssbuild
Copy link
Owner

ssbuild commented Mar 8, 2024

  1. 正常情况下是需要通过对应infer 模块下的文件进行推理。
  2. lora 权重 不能用peft直接加载 , 可以把lora 权重合并下, 这样权重就跟官方权重一样了,可以直接加载推理

@fangyifei222
Copy link
Author

我训练之后的模型文件就是27G,我的数据集是问答对所以对训练脚本的数据处理做了一点修改,训练完得到的这个pt文件我转成了.bin格式的权重。我加载这个模型就会提示缺少configuration_baichuan.py这种文件,我把baichuan-2-13B的对应文件复制过来加载模型会提示部分权重未使用
-rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
-rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_0_mp_rank_00_model_states.pt
-rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_1_mp_rank_00_model_states.pt

@ssbuild
Copy link
Owner

ssbuild commented Mar 8, 2024

我训练之后的模型文件就是27G,我的数据集是问答对所以对训练脚本的数据处理做了一点修改,训练完得到的这个pt文件我转成了.bin格式的权重。我加载这个模型就会提示缺少configuration_baichuan.py这种文件,我把baichuan-2-13B的对应文件复制过来加载模型会提示部分权重未使用 -rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt -rw-rw-r-- 1 fyf fyf 4918199 Mar 6 10:00 bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt -rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_0_mp_rank_00_model_states.pt -rw-rw-r-- 1 fyf fyf 13896988349 Mar 6 10:00 zero_pp_rank_1_mp_rank_00_model_states.pt

不能直接改后缀 , deepspeed 权重需要通过下面转换下
cd best_ckpt/last
python zero_to_fp32.py . ../last.ckpt

@ssbuild
Copy link
Owner

ssbuild commented Mar 8, 2024

# deepspeed 权重使用转换脚本命令

@fangyifei222
Copy link
Author

我好像知道问题出在哪了,我想请问一下我用的是自制的数据集,是不是数据集的格式得跟data文件夹的示例数据格式相同,我的数据集格式是这种:{
"Question": "请问8端口OC-12c/STM-4c POS-SFP灵活插卡的尺寸(宽×深×高)分别是多少?",
"Answer": "8端口OC-12c/STM-4c POS-SFP灵活插卡的尺寸(宽×深×高)分别是169mm × 189.9mm × 18.4mm。"
},
{
"Question": "请问4端口OC-3c/STM-1c POS-SFP 灵活插卡的典型散热值是多少?",
"Answer": "4端口OC-3c/STM-1c POS-SFP 灵活插卡的典型散热值是275.8 BTU/hour。"
},

@ssbuild
Copy link
Owner

ssbuild commented Mar 8, 2024

可以看下readme 的 datasample , 或者 data文件夹下的例子。

@fangyifei222
Copy link
Author

好的,谢谢您

@fangyifei222
Copy link
Author

您好我想请问一下,我修改了fine-tune.py中的peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["W_pack"],
inference_mode=False,
r=1,
lora_alpha=32,
lora_dropout=0.1,
)调整了r=8和 lora_alpha=16,但是训练结束后的模型加载报错
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.1.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.2.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.3.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).
size mismatch for base_model.model.model.layers.4.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).您知道怎么解决吗

@ssbuild
Copy link
Owner

ssbuild commented Mar 27, 2024

您好我想请问一下,我修改了fine-tune.py中的peft_config = LoraConfig( task_type=TaskType.CAUSAL_LM, target_modules=["W_pack"], inference_mode=False, r=1, lora_alpha=32, lora_dropout=0.1, )调整了r=8和 lora_alpha=16,但是训练结束后的模型加载报错 RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.1.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.2.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.3.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]). size mismatch for base_model.model.model.layers.4.self_attn.W_pack.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([15360, 8]).您知道怎么解决吗

你用的哪个脚本做的推理?

@fangyifei222
Copy link
Author

我用的fine-tune.py做的训练,cli_demo.py做的加载和推理,我用了AutoPeftModelForCausalLM来加载模型,之前默认的r=1 lora_alpha=32训练之后的模型可以正常加载,修改了这两个参数训练之后的模型就出现了尺寸问题

@ssbuild
Copy link
Owner

ssbuild commented Mar 27, 2024

我用的fine-tune.py做的训练,cli_demo.py做的加载和推理,我用了AutoPeftModelForCausalLM来加载模型,之前默认的r=1 lora_alpha=32训练之后的模型可以正常加载,修改了这两个参数训练之后的模型就出现了尺寸问题

推理脚本代码 贴一下,我看看

@fangyifei222
Copy link
Author

fangyifei222 commented Mar 27, 2024

from transformers import AutoTokenizer
from peft import AutoPeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1"
device = "cuda:0"
model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

model = model.to(device)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)

while True:

user_input = input("请输入您的问题(或输入'退出'来结束对话):")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复:", response)

@ssbuild
Copy link
Owner

ssbuild commented Mar 27, 2024

from transformers import AutoTokenizer from peft import AutoPeftModelForCausalLM from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1" device = "cuda:0" model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

model = model.to(device) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)

while True:

user_input = input("请输入您的问题(或输入'退出'来结束对话):")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复:", response)

不能直接加载,名字有点区别, 回头加一个权重转换脚本

@fangyifei222
Copy link
Author

from transformers import AutoTokenizer from peft import AutoPeftModelForCausalLM from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1" device = "cuda:0" model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
model = model.to(device) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
while True:

user_input = input("请输入您的问题(或输入'退出'来结束对话):")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复:", response)

不能直接加载,名字有点区别, 回头加一个权重转换脚本

要转换为.bin格式嘛,那为什么r=1 ,lora_alpha=32的时候直接就能加载

@ssbuild
Copy link
Owner

ssbuild commented Mar 28, 2024

from transformers import AutoTokenizer from peft import AutoPeftModelForCausalLM from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "/data/fyf/Baichuan2-main/fine-tune/outputr8a16-1" device = "cuda:0" model = AutoPeftModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
model = model.to(device) model.eval() tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)
while True:

user_input = input("请输入您的问题(或输入'退出'来结束对话):")
if user_input == "退出":
    break


messages = [{"role": "user", "content": user_input}]

response = model.chat(tokenizer, messages)


print("模型回复:", response)

不能直接加载,名字有点区别, 回头加一个权重转换脚本

要转换为.bin格式嘛,那为什么r=1 ,lora_alpha=32的时候直接就能加载

不是不是, 就是权重key ,不是文件名字 , 有时间我加一下。

@fangyifei222
Copy link
Author

@ssbuild
Copy link
Owner

ssbuild commented Apr 23, 2024

deep_export --mode=hf --src adapter_model.bin --dst=.
@fangyifei222 使用以上方式可以转换成huggingface 权重

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants