Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大佬好,请问使用lora和ptv2进行微调分别需要修改哪些配置? #211

Open
mircop1t opened this issue May 3, 2023 · 19 comments

Comments

@mircop1t
Copy link

mircop1t commented May 3, 2023

如题,readme里面看的有点懵,with_lora设为true就是使用lora微调吗?但在代码中没有找到显式选择ptv2微调的参数,求大佬解惑

@ssbuild
Copy link
Owner

ssbuild commented May 4, 2023

看下readme训练节对于ptv2 训练的解释

@Ikaros-521
Copy link

我也想说 redame写得太简略了,看不懂
6_}(MUD9BV4F9S WU`9PM

@ssbuild
Copy link
Owner

ssbuild commented May 6, 2023

@Kkkkkiradd @Ikaros-521

  1. lora 就修改配置文件的对应的with lora 等参数
  2. ptv2 选择配置文件 config/config_ptv2.json 主要 "pre_seq_len": 32, "prefix_projection": false 两个参数

@Ikaros-521
Copy link

Ikaros-521 commented May 6, 2023

@Kkkkkiradd @Ikaros-521

  1. lora 就修改配置文件的对应的with lora 等参数
  2. ptv2 选择配置文件 config/config_ptv2.json 主要 "pre_seq_len": 32, "prefix_projection": false 两个参数

image
大佬 with lora置为true了,data_utils跑完,做训练(chatglm-6b-int4)报错一下内容,是什么问题呢

INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7u9xa4bz
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7u9xa4bz/_remote_module_non_scriptable.py
INFO:lightning_fabric.utilities.seed:Global seed set to 42
INFO:lightning_fabric.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:lightning_fabric.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:lightning_fabric.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:lightning_fabric.utilities.rank_zero:HPU available: False, using: 0 HPUs
ChatGLMConfig {
  "architectures": [
    "ChatGLMModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
  },
  "bos_token_id": 130004,
  "eos_token_id": 130005,
  "gmask_token_id": 130001,
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "initializer_weight": false,
  "inner_hidden_size": 16384,
  "layernorm_epsilon": 1e-05,
  "mask_token_id": 130000,
  "max_sequence_length": 2048,
  "model_type": "chatglm",
  "num_attention_heads": 32,
  "num_layers": 28,
  "pad_token_id": 3,
  "position_encoding_2d": true,
  "pre_seq_len": null,
  "precision": 16,
  "prefix_projection": false,
  "quantization_bit": 4,
  "return_dict": false,
  "task_specific_params": {
    "learning_rate": 2e-05,
    "learning_rate_for_task": 2e-05
  },
  "torch_dtype": "float16",
  "transformers_version": "4.26.1",
  "use_cache": true,
  "vocab_size": 130528
}

INFO:root:make_dataset ./data/finetune_train_examples.json train...
INFO:root:make data ./output/dataset_file_0_dupe_factor_0-train.record...
TrainingArguments(optimizer='lion', scheduler_type='CAWR', scheduler={'T_mult': 1, 'rewarm_epoch_num': 0.5, 'verbose': False}, adv=None, hierarchical_position=None, learning_rate=2e-05, learning_rate_for_task=2e-05, max_epochs=20, max_steps=-1, optimizer_betas=(0.9, 0.999), adam_epsilon=1e-08, gradient_accumulation_steps=1, max_grad_norm=1.0, weight_decay=0, warmup_steps=0, train_batch_size=2, eval_batch_size=1, test_batch_size=1, seed=42)
ModelArguments(model_name_or_path='/root/autodl-tmp/ChatGLM-6B/THUDM/chatglm-6b-int4', model_type='chatglm', config_overrides=None, config_name='./config/config.json', tokenizer_name='/root/autodl-tmp/ChatGLM-6B/THUDM/chatglm-6b-int4', cache_dir=None, do_lower_case=False, use_fast_tokenizer=False, model_revision='main', use_auth_token=False)

****************************** lora info
trainable params: 3670016 || all params: 3359072256 || trainable%: 0.10925683403935685
INFO:root:update index for ./output/dataset_file_0_dupe_factor_0-train.record...
INFO:root:update index for ./output/dataset_file_0_dupe_factor_0-train.record finish
****************************** total 300
INFO:root:load dataset to memory...
/root/miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/configuration_validator.py:72: PossibleUserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
  rank_zero_warn(
WARNING: Missing logger folder: output/lightning_logs
WARNING:lightning.pytorch.loggers.tensorboard:Missing logger folder: output/lightning_logs
INFO:lightning_fabric.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
/root/miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:430: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:
  | Name                                  | Type      | Params
--------------------------------------------------------------------
0 | _TransformerLightningModule__backbone | LoraModel | 3.4 B
--------------------------------------------------------------------
3.7 M     Trainable params
3.4 B     Non-trainable params
3.4 B     Total params
13,436.289Total estimated model params size (MB)
INFO:lightning.pytorch.callbacks.model_summary:
  | Name                                  | Type      | Params
--------------------------------------------------------------------
0 | _TransformerLightningModule__backbone | LoraModel | 3.4 B
--------------------------------------------------------------------
3.7 M     Trainable params
3.4 B     Non-trainable params
3.4 B     Total params
13,436.289Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                                                                        | 0/150 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:184 in <module>                                     │
│                                                                                                  │
│   181 │   │   )                                                                                  │
│   182 │   │                                                                                      │
│   183 │   │   if train_datasets is not None:                                                     │
│ ❱ 184 │   │   │   trainer.fit(pl_model, train_dataloaders=train_datasets)                        │
│   185 │                                                                                          │
│   186 │   else:                                                                                  │
│   187 │   │   if lora_args is not None:                                                          │
│                                                                                                  │
│ /root/miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py:520 in fit     │
│                                                                                                  │
│    517 │   │   """                                                                               │
│    518 │   │   model = _maybe_unwrap_optimized(model)                                            │
│    519 │   │   self.strategy._lightning_module = model                                           │
│ ❱  520 │   │   call._call_and_handle_interrupt(                                                  │
│    521 │   │   │   self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule,  │
│    522 │   │   )   

中间省略

│ /root/miniconda3/lib/python3.8/site-packages/deep_training/nlp/layers/lora_v2/layers.py:155 in   │
│ forward                                                                                          │
│                                                                                                  │
│   152 │   │   │   │   self.unmerge()                                                             │
│   153 │   │   │   result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.   │
│   154 │   │   elif self.r[self.active_adapter] > 0 and not self.merged:                          │
│ ❱ 155 │   │   │   result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.   │
│   156 │   │   │   result += (                                                                    │
│   157 │   │   │   │   self.lora_B[self.active_adapter](                                          │
│   158 │   │   │   │   │   self.lora_A[self.active_adapter](self.lora_dropout[self.active_adapt   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: self and mat2 must have the same dtype
Exception ignored in: <function tqdm.__del__ at 0x7f6cf2c8a940>
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1152, in __del__
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1306, in close
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1499, in display
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1155, in __str__
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1457, in format_dict
TypeError: cannot unpack non-iterable NoneType object

$X4GW@B8DX`50Y5P2FV 60O

@ssbuild
Copy link
Owner

ssbuild commented May 6, 2023

lora 需要加载半精度权重, 你的权重下载错了,从官网下载权重试试。

@Ikaros-521
Copy link

Ikaros-521 commented May 6, 2023

lora 需要加载半精度权重, 你的权重下载错了,从官网下载权重试试。

大佬,是这个ice_text.model吗,应该没有下错吧,我不跑lora是可以用的,直接做推理也行
image
image

image

@ssbuild
Copy link
Owner

ssbuild commented May 6, 2023

推理可以, 训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载, 重新配置制作数据 训练应该就可以。

@Ikaros-521
Copy link

推理可以, 训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载, 重新配置制作数据 训练应该就可以。

感谢大佬,我试试
P%HGZ{W3 4S6X`_4@KUM{VU

@Ikaros-521
Copy link

推理可以, 训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载, 重新配置制作数据 训练应该就可以。

大佬是bin模型也得用6b的吗,那我就没法跑了,容量危机了
(U}6}N~MZ5@LPK) I42TO D

@ssbuild
Copy link
Owner

ssbuild commented May 6, 2023

推理可以, 训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载, 重新配置制作数据 训练应该就可以。

大佬是bin模型也得用6b的吗,那我就没法跑了,容量危机了 (U}6}N~MZ5@LPK) I42TO D

int4 可以玩 ptv2 , 修改config/config_ptv2.json quantization_bit=4 , 并在train_info_args 改成 config/config_ptv2.json

@Ikaros-521
Copy link

推理可以, 训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载, 重新配置制作数据 训练应该就可以。

大佬是bin模型也得用6b的吗,那我就没法跑了,容量危机了 (U}6}N~MZ5@LPK) I42TO D

int4 可以玩 ptv2 , 修改config/config_ptv2.json quantization_bit=4 , 并在train_info_args 改成 config/config_ptv2.json

好好好,我逝试

@Ikaros-521
Copy link

Ikaros-521 commented May 6, 2023

_ptv2

难绷 是不是最开始说的配置漏了(没漏emmm)
O~ECZ`E99{K@NCBA) UBL2U
image

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

@ssbuild
Copy link
Owner

ssbuild commented May 6, 2023

_ptv2

难绷 O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

@Ikaros-521
Copy link

_ptv2

难绷 O~ECZE99{K@NCBA) UBL2U [ O~ECZE99{K@NCBA) UBL2U

    [
      
    
        ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
    
      
        
          
        
        
          
          
        
      
      [
        
          
        
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
   ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

www 逝试
$ 1LRC8)T}}_`7DI1N8{ I6

@Ikaros-521
Copy link

_ptv2

难绷 O~ECZE99{K@NCBA) UBL2U [ O~ECZE99{K@NCBA) UBL2U

    [
      
    
        ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
    
      
        
          
        
        
          
          
        
      
      [
        
          
        
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
   ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

跑起来了!!!感谢大佬耐心解答
6_}(MUD9BV4F9S WU`9PM
KGH EQC$ N%4{{GY0 1P6G9

@Ikaros-521
Copy link

Ikaros-521 commented May 6, 2023

_ptv2

难绷 O~ECZE99{K@NCBA) UBL2U [ O~ECZE99{K@NCBA) UBL2U

    [
      
    
        ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
    
      
        
          
        
        
          
          
        
      
      [
        
          
        
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
   ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

大佬 训练好后推理infer_lora_finetuning.py
ValueError: Can't find 'adapter_config.json' at './best_ckpt'
这adapter_config.json 是从哪来的呢

直接跑infer没问题,是说这个 infer_lora_finetuning.py 暂时可以不需要管他吗

@ssbuild
Copy link
Owner

ssbuild commented May 6, 2023

@Ikaros-521

infer_lora_finetuning.py 是 lora
ptv2 推理 使用infer_finetuning.py

@Ikaros-521
Copy link

@Ikaros-521

infer_lora_finetuning.py 是 lora ptv2 推理 使用infer_finetuning.py

啊这 忘了我训练的是ptv2了。。。 尴尬
A46REX S)%5IFTE6FS8Y6

@mircop1t
Copy link
Author

mircop1t commented May 7, 2023

@Kkkkkiradd @Ikaros-521

  1. lora 就修改配置文件的对应的with lora 等参数
  2. ptv2 选择配置文件 config/config_ptv2.json 主要 "pre_seq_len": 32, "prefix_projection": false 两个参数

谢谢大佬解答!我在lora训练成功后运行infer_lora_finetuning.py文件,没有报错正确结束,但是模型没有给出结果,请问大佬这是什么情况呀
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants