Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

究竟怎么做dpo呀 #3395

Open
1 task done
XuanRen4470 opened this issue Apr 23, 2024 · 4 comments
Open
1 task done

究竟怎么做dpo呀 #3395

XuanRen4470 opened this issue Apr 23, 2024 · 4 comments
Labels
pending This problem is yet to be addressed.

Comments

@XuanRen4470
Copy link

XuanRen4470 commented Apr 23, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

我已经严格按照example里dpo的流程做了(lora)

  1. train sft model
  2. 把sft adapter的路径给adapter_name_or_path, 把mistral 7b 的路径(没merge)给model_name_or_path
  3. 推理时 把mistral 7b 的路径(没merge)给model_name_or_path, 然后adapter_name_or_path 给sft adapter路径,dpo adapter路径

我发现dpo训练完后从来没超过sft过(我在9个数据集上测试)

还有是我读别人的问题,发现好多人说merge lora model。 可是我读dpo的readme并没有提到merge。

create_new_adapter,overwrite_cache,这两个我全程在用,虽说也不知道它们什么意思。感觉需要一些更加详细的readme文档。

还有是train完之后具体要怎么inference呢?根据read me 我在inference时需要给adapter_name_or_path 给sft adapter路径,dpo adapter路径。可是如果merge weight的话原则上来说我认为不需要这些东西呀(我之前的实验都没merge因为readme没说要merge lora)

总结:
第一个问题:究竟具体应该怎么train,要不要merge lora weight
第二个问题:具体怎么inference(假如merge 了 lora wright那么readme没有讲这种情况如何inference)

Expected behavior

No response

System Info

No response

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed. label Apr 23, 2024
@hiyouga
Copy link
Owner

hiyouga commented Apr 23, 2024

DPO 不是用来刷数据集准确率的

@XuanRen4470
Copy link
Author

DPO 不是用来刷数据集准确率的

但是我记得dpo是可以拿来提高模型能力的呀?还有dpo具体的流程究竟是什么呀?我现在加了一个merge sft lora的操作好像准确率有提高。可是readme的dpo example里没有提到merge lora。我现在inference和train的流程和readme里全都不一样但是准确率好像高了一些。

@Felixgithub2017
Copy link

example 在哪里呢?

@AlexYoung757
Copy link

DPO 不是用来刷数据集准确率的

但是我记得dpo是可以拿来提高模型能力的呀?还有dpo具体的流程究竟是什么呀?我现在加了一个merge sft lora的操作好像准确率有提高。可是readme的dpo example里没有提到merge lora。我现在inference和train的流程和readme里全都不一样但是准确率好像高了一些。

推理的时候可以先把dpo训练的数据merge后再进行推理:
(1)把dpo训练的模型进行merge

MODEL_PATH=/your path/Qwen1.5-32B-Chat
OUTPUT_PATH=/your path/qwen-32b-dpo
EXPORT_PATH=/your path/qwen-32b-dpo-merge

python ../src/export_model.py \
    --model_name_or_path $MODEL_PATH  \
    --adapter_name_or_path $OUTPUT_PATH  \
    --template qwen \
    --finetuning_type lora \
    --export_dir $EXPORT_PATH  \
    --export_size 2 \
    --export_legacy_format False \

(2)模型推理

MODEL_PATH=/you path/qwen-32b-dpo-merge

python ../src/cli_demo.py \
    --model_name_or_path $MODEL_PATH  \
    --template qwen \
    --finetuning_type lora  \
    --pure_bf16  \
    --flash_attn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed.
Projects
None yet
Development

No branches or pull requests

4 participants