Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] <title>lora微调loss异常? #1214

Open
2 tasks done
estuday opened this issue Apr 16, 2024 · 7 comments
Open
2 tasks done

[BUG] <title>lora微调loss异常? #1214

estuday opened this issue Apr 16, 2024 · 7 comments

Comments

@estuday
Copy link

estuday commented Apr 16, 2024

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

我在使用官方代码中qlora对qwen1.8b-chat-int4进行微调,脚本如下:

python finetune.py \
  --model_name_or_path $MODEL \
  --data_path $DATA \
  --fp16 True \
  --output_dir output_qwen \
  --num_train_epochs 1 \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 1 \
  --gradient_accumulation_steps 1 \
  --evaluation_strategy "no" \
  --save_strategy "steps" \
  --save_steps 1000 \
  --save_total_limit 10 \
  --learning_rate 3e-4 \
  --weight_decay 0.1 \
  --adam_beta2 0.95 \
  --warmup_ratio 0.01 \
  --lr_scheduler_type "cosine" \
  --logging_steps 1 \
  --report_to "none" \
  --model_max_length 512 \
  --lazy_preprocess True \
  --gradient_checkpointing \
  --use_lora \
  --q_lora \
  --deepspeed ./ds_config_zero2.json

使用的数据36000条左右,都是单轮对话,示例如下:
{"id": "0", "conversations": [ {"from": "user", "value": "世界上最早的报纸诞生于"}, {"from": "assistant", "value": "中国。北宋末年(公元11,12世纪)出现的印刷报纸,不仅是中国新闻史上最早的印刷报纸,也是世界新闻史上最早的印刷报纸.中国新闻事业历史的悠久,内容的丰富,是任何西方国家都难以比肩的.<e>中国古代的报纸产生于中国的封建社会时期,是封建地主阶级及其政治代表占统治地位的封建自然经济通过新闻手段的反映.在漫长的封建社会时期,中国古代的报纸,不论是官方的邸报,还是民办的小报和京报,都必然要和当时的封建统治者保持一定的联系,受他们的制约.官方的邸报固然是封建统治阶级的喉舌和御用的宣传工具,民办的小报和京报也只能在封建统治阶级的控制下活动,不能越雷池一步.封建统治者绝不允许可以自由报道一切消息和自由发表一切意见的报纸存在.中国古代的报纸在为当时的读者提供朝野政治和社会信息方面确实起过一定的作用,但始终没有摆脱统治阶级的掌握.中国古代报纸的历史,基本上是一部封建统治阶级掌握传播媒介,控制舆论工具,限制言论出版自由的历史.<e>中国古代的邸报有1200年左右的历史.小报有近千年的历史.民间报房出版的邸报,京报有近400年的历史.它们从诞生到结束,持续的时间都不算短,但发展不快,形式内容的变化不大."}] }
在微调的过程中只用了几百step,loss基本就变得很小,在0左右:
image

期望行为 | Expected Behavior

我期望它能够正确输出微调数据的微调内容。我原以为用的是公共数据集,可能在预训练阶段官方用过,所以loss会很小,但是我在合并权重后,测试了微调效果,发现结果很差,并且随着step的增加,模型输出的内容还会重复。
image

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

@estuday
Copy link
Author

estuday commented Apr 16, 2024

补充完成一轮训练后的输出情况:
image

@jklj077
Copy link
Contributor

jklj077 commented Apr 17, 2024

How did you conduct the inference?

@estuday
Copy link
Author

estuday commented Apr 17, 2024

How did you conduct the inference?

model.chat(query,tokenizer,history)

@jklj077
Copy link
Contributor

jklj077 commented Apr 18, 2024

Please first try adjusting the repetition penalty (higher), the temperature (higher), and the top_p (higher) in the generation_config.json.

@estuday
Copy link
Author

estuday commented Apr 19, 2024

Please first try adjusting the repetition penalty (higher), the temperature (higher), and the top_p (higher) in the generation_config.json.

Hi,
I adjusted these parameters and the effect seemed to be better, but the model didn't seem to stop in time and kept reasoning, which is more like the case of basemodel than chatmodel
image

Copy link

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.
此问题由于长期未有新进展而被系统自动标记为不活跃。如果您认为它仍有待解决,请在此帖下方留言以补充信息。

@jklj077
Copy link
Contributor

jklj077 commented May 20, 2024

Hi!

It appears that the something is wrong with the stopping criteria. Normally, model.chat does that for you, but it may be worth doube check. If you are using transformers, you need to adjust the generation_config.json and see if the eos_token is set properly (it should be <|im_end|> 151645 and <|endoftext|> 151643).

I would advise you to migrate to Qwen1.5, though, as Qwen1.0 and its code is not actively maintained.

@github-actions github-actions bot removed the inactive label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants