大佬好，请问使用lora和ptv2进行微调分别需要修改哪些配置？ #211

mircop1t · 2023-05-03T15:38:34Z

如题，readme里面看的有点懵，with_lora设为true就是使用lora微调吗？但在代码中没有找到显式选择ptv2微调的参数，求大佬解惑

ssbuild · 2023-05-04T00:25:44Z

看下readme训练节对于ptv2 训练的解释

Ikaros-521 · 2023-05-06T01:40:41Z

我也想说 redame写得太简略了，看不懂

ssbuild · 2023-05-06T08:13:17Z

@Kkkkkiradd @Ikaros-521

lora 就修改配置文件的对应的with lora 等参数
ptv2 选择配置文件 config/config_ptv2.json 主要 "pre_seq_len": 32, "prefix_projection": false 两个参数

Ikaros-521 · 2023-05-06T08:21:20Z

@Kkkkkiradd @Ikaros-521

lora 就修改配置文件的对应的with lora 等参数

ptv2 选择配置文件 config/config_ptv2.json 主要 "pre_seq_len": 32, "prefix_projection": false 两个参数

大佬 with lora置为true了，data_utils跑完，做训练（chatglm-6b-int4）报错一下内容，是什么问题呢

INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7u9xa4bz
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7u9xa4bz/_remote_module_non_scriptable.py
INFO:lightning_fabric.utilities.seed:Global seed set to 42
INFO:lightning_fabric.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:lightning_fabric.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:lightning_fabric.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:lightning_fabric.utilities.rank_zero:HPU available: False, using: 0 HPUs
ChatGLMConfig {
  "architectures": [
    "ChatGLMModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
  },
  "bos_token_id": 130004,
  "eos_token_id": 130005,
  "gmask_token_id": 130001,
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "initializer_weight": false,
  "inner_hidden_size": 16384,
  "layernorm_epsilon": 1e-05,
  "mask_token_id": 130000,
  "max_sequence_length": 2048,
  "model_type": "chatglm",
  "num_attention_heads": 32,
  "num_layers": 28,
  "pad_token_id": 3,
  "position_encoding_2d": true,
  "pre_seq_len": null,
  "precision": 16,
  "prefix_projection": false,
  "quantization_bit": 4,
  "return_dict": false,
  "task_specific_params": {
    "learning_rate": 2e-05,
    "learning_rate_for_task": 2e-05
  },
  "torch_dtype": "float16",
  "transformers_version": "4.26.1",
  "use_cache": true,
  "vocab_size": 130528
}

INFO:root:make_dataset ./data/finetune_train_examples.json train...
INFO:root:make data ./output/dataset_file_0_dupe_factor_0-train.record...
TrainingArguments(optimizer='lion', scheduler_type='CAWR', scheduler={'T_mult': 1, 'rewarm_epoch_num': 0.5, 'verbose': False}, adv=None, hierarchical_position=None, learning_rate=2e-05, learning_rate_for_task=2e-05, max_epochs=20, max_steps=-1, optimizer_betas=(0.9, 0.999), adam_epsilon=1e-08, gradient_accumulation_steps=1, max_grad_norm=1.0, weight_decay=0, warmup_steps=0, train_batch_size=2, eval_batch_size=1, test_batch_size=1, seed=42)
ModelArguments(model_name_or_path='/root/autodl-tmp/ChatGLM-6B/THUDM/chatglm-6b-int4', model_type='chatglm', config_overrides=None, config_name='./config/config.json', tokenizer_name='/root/autodl-tmp/ChatGLM-6B/THUDM/chatglm-6b-int4', cache_dir=None, do_lower_case=False, use_fast_tokenizer=False, model_revision='main', use_auth_token=False)

****************************** lora info
trainable params: 3670016 || all params: 3359072256 || trainable%: 0.10925683403935685
INFO:root:update index for ./output/dataset_file_0_dupe_factor_0-train.record...
INFO:root:update index for ./output/dataset_file_0_dupe_factor_0-train.record finish
****************************** total 300
INFO:root:load dataset to memory...
/root/miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/configuration_validator.py:72: PossibleUserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
  rank_zero_warn(
WARNING: Missing logger folder: output/lightning_logs
WARNING:lightning.pytorch.loggers.tensorboard:Missing logger folder: output/lightning_logs
INFO:lightning_fabric.utilities.rank_zero:Loading `train_dataloader` to estimate number of stepping batches.
/root/miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:430: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 32 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:
  | Name                                  | Type      | Params
--------------------------------------------------------------------
0 | _TransformerLightningModule__backbone | LoraModel | 3.4 B
--------------------------------------------------------------------
3.7 M     Trainable params
3.4 B     Non-trainable params
3.4 B     Total params
13,436.289Total estimated model params size (MB)
INFO:lightning.pytorch.callbacks.model_summary:
  | Name                                  | Type      | Params
--------------------------------------------------------------------
0 | _TransformerLightningModule__backbone | LoraModel | 3.4 B
--------------------------------------------------------------------
3.7 M     Trainable params
3.4 B     Non-trainable params
3.4 B     Total params
13,436.289Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                                                                        | 0/150 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:184 in <module>                                     │
│                                                                                                  │
│   181 │   │   )                                                                                  │
│   182 │   │                                                                                      │
│   183 │   │   if train_datasets is not None:                                                     │
│ ❱ 184 │   │   │   trainer.fit(pl_model, train_dataloaders=train_datasets)                        │
│   185 │                                                                                          │
│   186 │   else:                                                                                  │
│   187 │   │   if lora_args is not None:                                                          │
│                                                                                                  │
│ /root/miniconda3/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py:520 in fit     │
│                                                                                                  │
│    517 │   │   """                                                                               │
│    518 │   │   model = _maybe_unwrap_optimized(model)                                            │
│    519 │   │   self.strategy._lightning_module = model                                           │
│ ❱  520 │   │   call._call_and_handle_interrupt(                                                  │
│    521 │   │   │   self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule,  │
│    522 │   │   )   

中间省略

│ /root/miniconda3/lib/python3.8/site-packages/deep_training/nlp/layers/lora_v2/layers.py:155 in   │
│ forward                                                                                          │
│                                                                                                  │
│   152 │   │   │   │   self.unmerge()                                                             │
│   153 │   │   │   result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.   │
│   154 │   │   elif self.r[self.active_adapter] > 0 and not self.merged:                          │
│ ❱ 155 │   │   │   result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.   │
│   156 │   │   │   result += (                                                                    │
│   157 │   │   │   │   self.lora_B[self.active_adapter](                                          │
│   158 │   │   │   │   │   self.lora_A[self.active_adapter](self.lora_dropout[self.active_adapt   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: self and mat2 must have the same dtype
Exception ignored in: <function tqdm.__del__ at 0x7f6cf2c8a940>
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1152, in __del__
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1306, in close
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1499, in display
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1155, in __str__
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1457, in format_dict
TypeError: cannot unpack non-iterable NoneType object

ssbuild · 2023-05-06T08:27:46Z

lora 需要加载半精度权重，你的权重下载错了，从官网下载权重试试。

Ikaros-521 · 2023-05-06T08:36:22Z

lora 需要加载半精度权重，你的权重下载错了，从官网下载权重试试。

大佬，是这个ice_text.model吗，应该没有下错吧，我不跑lora是可以用的，直接做推理也行

ssbuild · 2023-05-06T08:38:29Z

推理可以，训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载，重新配置制作数据训练应该就可以。

Ikaros-521 · 2023-05-06T08:40:37Z

推理可以，训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载，重新配置制作数据训练应该就可以。

感谢大佬，我试试
$P%HGZ{W3 4S6X`_4@KUM{VU$

Ikaros-521 · 2023-05-06T08:49:15Z

推理可以，训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载，重新配置制作数据训练应该就可以。

大佬是bin模型也得用6b的吗，那我就没法跑了，容量危机了

ssbuild · 2023-05-06T08:56:30Z

推理可以，训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载，重新配置制作数据训练应该就可以。

大佬是bin模型也得用6b的吗，那我就没法跑了，容量危机了

int4 可以玩 ptv2 , 修改config/config_ptv2.json quantization_bit=4 ，并在train_info_args 改成 config/config_ptv2.json

Ikaros-521 · 2023-05-06T08:57:15Z

推理可以，训练的话半精度从 https://huggingface.co/THUDM/chatglm-6b 下载，重新配置制作数据训练应该就可以。

大佬是bin模型也得用6b的吗，那我就没法跑了，容量危机了

int4 可以玩 ptv2 , 修改config/config_ptv2.json quantization_bit=4 ，并在train_info_args 改成 config/config_ptv2.json

好好好，我逝试

Ikaros-521 · 2023-05-06T09:02:15Z

_ptv2

难绷是不是最开始说的配置漏了（没漏emmm）

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

ssbuild · 2023-05-06T09:10:16Z

_ptv2

难绷 ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

Ikaros-521 · 2023-05-06T09:11:53Z

_ptv2

难绷 [

    [
      
    
        ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
    
      
        
          
        
        
          
          
        
      
      [
        
          
        
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
   ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

www 逝试
$$ 1LRC8)T}}_`7DI1N8{ I6$

Ikaros-521 · 2023-05-06T09:14:45Z

_ptv2

难绷 [

    [
      
    
        ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
    
      
        
          
        
        
          
          
        
      
      [
        
          
        
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
   ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

跑起来了！！！感谢大佬耐心解答

Ikaros-521 · 2023-05-06T12:41:41Z

_ptv2

难绷 [

    [
      
    
        ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ![O~ECZE99{K@NCBA) UBL2U](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
    
      
        
          
        
        
          
          
        
      
      [
        
          
        
      ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)
    
   ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif) [ ](https://user-images.githubusercontent.com/40910637/236614455-50ccd142-300b-477f-8271-bf959924a39b.gif)

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /root/autodl-tmp/chatglm_finetuning/train.py:133 in <module>                                     │
│                                                                                                  │
│   130 │                                                                                          │
│   131 │   if config.pre_seq_len is not None:                                                     │
│   132 │   │   if config.quantization_bit:                                                        │
│ ❱ 133 │   │   │   raise Exception('量化模型不支持微调训练')                                      │
│   134 │                                                                                          │
│   135 │   # 额外参数                                                                             │
│   136 │   checkpoint_callback.tokenizer = tokenizer                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: 量化模型不支持微调训练

把那一行注掉试试

大佬训练好后推理infer_lora_finetuning.py
ValueError: Can't find 'adapter_config.json' at './best_ckpt'
这adapter_config.json 是从哪来的呢

直接跑infer没问题，是说这个 infer_lora_finetuning.py 暂时可以不需要管他吗

ssbuild · 2023-05-06T13:35:40Z

@Ikaros-521

infer_lora_finetuning.py 是 lora
ptv2 推理使用infer_finetuning.py

Ikaros-521 · 2023-05-06T13:36:41Z

@Ikaros-521

infer_lora_finetuning.py 是 lora ptv2 推理使用infer_finetuning.py

啊这忘了我训练的是ptv2了。。。尴尬

mircop1t · 2023-05-07T17:06:45Z

@Kkkkkiradd @Ikaros-521

lora 就修改配置文件的对应的with lora 等参数

ptv2 选择配置文件 config/config_ptv2.json 主要 "pre_seq_len": 32, "prefix_projection": false 两个参数

谢谢大佬解答！我在lora训练成功后运行infer_lora_finetuning.py文件，没有报错正确结束，但是模型没有给出结果，请问大佬这是什么情况呀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

大佬好，请问使用lora和ptv2进行微调分别需要修改哪些配置？ #211

大佬好，请问使用lora和ptv2进行微调分别需要修改哪些配置？ #211

mircop1t commented May 3, 2023

ssbuild commented May 4, 2023

Ikaros-521 commented May 6, 2023

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023 •

edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023 •

edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023 •

edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023 •

edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

mircop1t commented May 7, 2023

大佬好，请问使用lora和ptv2进行微调分别需要修改哪些配置？ #211

大佬好，请问使用lora和ptv2进行微调分别需要修改哪些配置？ #211

Comments

mircop1t commented May 3, 2023

ssbuild commented May 4, 2023

Ikaros-521 commented May 6, 2023

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023 • edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023 • edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023 • edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023

Ikaros-521 commented May 6, 2023 • edited

ssbuild commented May 6, 2023

Ikaros-521 commented May 6, 2023

mircop1t commented May 7, 2023

Ikaros-521 commented May 6, 2023 •

edited

Ikaros-521 commented May 6, 2023 •

edited

Ikaros-521 commented May 6, 2023 •

edited

Ikaros-521 commented May 6, 2023 •

edited