rlhf

Here are 109 public repositories matching this topic...

argilla-io / distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated May 10, 2024
Python

hiyouga / LLaMA-Factory

Star

Unify Efficient Fine-Tuning of 100+ LLMs

Updated May 10, 2024
Python

xtreme1-io / xtreme1

Star

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

computer-vision image-annotation annotation point-cloud image-classification annotation-tool 3d-annotation labeling-tool multimodal image-labelling-tool rlhf

Updated May 10, 2024
TypeScript

opendilab / awesome-RLHF

Star

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated May 10, 2024

allenai / reward-bench

Star

RewardBench: the first evaluation tool for reward models.

preference-learning rlhf

Updated May 10, 2024
Python

RLHFlow / RLHF-Reward-Modeling

Star

A recipe to train reward models for RLHF.

reward-functions llm rlhf

Updated May 10, 2024
Python

argilla-io / argilla

Star

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated May 10, 2024
Python

voidful / TextRL

Sponsor

Star

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

nlp reinforcement-learning pytorch nlg language-model gpt-2 gpt-3 controlled-nlg chatgpt rlhf

Updated May 9, 2024
Python

jazelly / FinetuneLLMs

Star

Collections of all kinds of LLMs finetuning scripts

python mac ai cpp llama train lora finetune sft llm rlhf orpo

Updated May 9, 2024
Jupyter Notebook

huggingface / alignment-handbook

Star

Robust recipes to align language models with human and AI preferences

transformers llm rlhf

Updated May 9, 2024
Python

RLHFlow / Directional-Preference-Alignment

Star

Directional Preference Alignment

ai-alignment large-language-models rlhf

Updated May 9, 2024

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated May 8, 2024
Jupyter Notebook

opening-up-chatgpt / opening-up-chatgpt.github.io

Star

Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

open-source transparency llm chatgpt rlhf chatgpt-free