Skip to content

DPO : newbie questions (data format and num_train_epochs) #2598

Closed Answered by hiyouga
alphaversedev asked this question in Q&A
Discussion options

You must be logged in to vote
  1. DPO needs preference dataset (like comparison_gpt4_data_en.json) while the SFT uses supervised datasets (like alpaca_data_en_52k.json`)

  2. You can adjust the num_train_epochs according to the convergence of the loss curve

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@alphaversedev
Comment options

Answer selected by alphaversedev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
solved This problem has been already solved.
2 participants