Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 559 Bytes

RM-model.md

File metadata and controls

33 lines (21 loc) · 559 Bytes

Sections to train Reward Model (RM)

Trainer code based on huggingface. Compatible with deepspeed or accelerate

Dataset

For now we only supports webgpt and summary dataset from OpenAI.

Model

You can add new huggingface model as you want.

Example1:

Run training procedure

python trainer.py

Example2:

Additional axis labeling, this outputs a 4 summary quality evaluation metrics, (score are normalized to 0-1 )

python summary_quality_trainer.py

The four summary are :

  • overall
  • accuracy
  • coverage
  • coherence