Sections to train Reward Model (RM)

Trainer code based on huggingface. Compatible with deepspeed or accelerate

Dataset

For now we only supports webgpt and summary dataset from OpenAI.

You can add new huggingface model as you want.

Run training procedure

python trainer.py

Additional axis labeling, this outputs a 4 summary quality evaluation metrics, (score are normalized to 0-1 )

python summary_quality_trainer.py

The four summary are :