Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified reward function/model architecture for a wide range of tasks #4

Open
James4Ever0 opened this issue Dec 24, 2022 · 2 comments
Open

Comments

@James4Ever0
Copy link

James4Ever0 commented Dec 24, 2022

I find the reward function to be the most important part of RLHF, because it is the part which mimics a human evaluator, providing instant feedback to the model.

However, due to ChatGPT's wide range of language capabilities, it is hard to model such reward function with a single model to be prompt dependent, context aware, leveraging existing knowledge from pretrained models.

Most projects relating to RLHF usually use toy-like reward functions such as counting word frequencies, checking output formats, or just sentiment/fluency scores. These functions do not "think" like the human evaluator, considering every factor as a whole. RL4LMs propose GRUE in which the model performs general instructions but it does not expose a simple unified interface to get a score given prompt and answer.

RL4LMs contains a registry of reward functions which I find it complex and not leveraging current (by current I mean the SFT model we are working on, in this case, PaLM) pretrained models. I think a reward function should be an integrated part of the language model itself, rather than outsourcing it to other models with different architectures which require separate pre-training and fine-tuning, able to attribute the reward to fine-grained sections of outputs.

@James4Ever0 James4Ever0 changed the title Suggest unified reward function/model architecture for a wide range of tasks Unified reward function/model architecture for a wide range of tasks Dec 24, 2022
@James4Ever0
Copy link
Author

James4Ever0 commented Dec 24, 2022

RLHF requires creating multiple models like SFT, RM, PPO-tuned model. Is it possible to improve storage and memory efficiency, reduce computation if we freeze some huge layers of the pretrained model, only fine-tune certain layers to create SFT, RM, PPO using OpenDelta or other libraries/methods? I read that your repo is using LoRA but I'm not sure if it fulfills all goals described above. Common implementations like minRLHF requires four separate models, three are derived from the pretrained model as actor, critic and reference, in addition to an external sentiment rating model.

@James4Ever0
Copy link
Author

James4Ever0 commented Dec 28, 2022

To address this proposal even further, I think a good reward function can self-evolve and adapt to new environments (when the data source is no longer fixed static "archives" but streaming), making this model communicative, multipurpose, realtime and even into AGI. A good reward function can let the agent to learn from almost anything, including human feedback, computer system (sensor data, terminal/GUI input/output, internet, program threads and more) and self-invented signals. WebGPT is a clear example to make GPT3 into an active agent. There will be more to come.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant