DPO : newbie questions (data format and num_train_epochs) #2598

alphaversedev · 2024-02-26T08:19:16Z

alphaversedev
Feb 26, 2024

Hello,
Some quick questions for DPO - I have done SFT finetuning, so the next step is the DPO training I think.

Question 1: it looks like I can reuse the SFT training dataset, not the data format mentioned here https://huggingface.co/docs/trl/main/en/dpo_trainer (with 'choosen' and 'rejected'). Am I correct ?

Question 2: should the num_train_epochs same as the one used for SFT ?

thanks in advance!

Answered by hiyouga

DPO needs preference dataset (like comparison_gpt4_data_en.json) while the SFT uses supervised datasets (like alpaca_data_en_52k.json`)
You can adjust the num_train_epochs according to the convergence of the loss curve

hiyouga · 2024-02-26T08:26:59Z

DPO needs preference dataset (like comparison_gpt4_data_en.json) while the SFT uses supervised datasets (like alpaca_data_en_52k.json`)
You can adjust the num_train_epochs according to the convergence of the loss curve

1 reply

OIC, after looking at the sample json files carefully. thank you so much !