DPO : newbie questions (data format and num_train_epochs) #2598
-
Hello, Question 1: it looks like I can reuse the SFT training dataset, not the data format mentioned here https://huggingface.co/docs/trl/main/en/dpo_trainer (with 'choosen' and 'rejected'). Am I correct ? Question 2: should the num_train_epochs same as the one used for SFT ? thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Beta Was this translation helpful? Give feedback.
DPO needs preference dataset (like comparison_gpt4_data_en.json) while the SFT uses supervised datasets (like alpaca_data_en_52k.json`)
You can adjust the
num_train_epochs
according to the convergence of the loss curve