在PPO时，reward模型输出的分数在-1 至 15 之间，正常吗？ #1733

haohuisss started this conversation in General

haohuisss
Dec 4, 2023

需要使用ppo_score_norm吗？在什么情况应该使用？

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment