Skip to content

chowfi/FineTune-LLM-OnlineRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

This is a group project developed by a team of three individuals.

FineTune-LLM-OnlineRL

Game: Xiangqi

Main Idea - Fine-tuning LLM Agent with Online RL (PPO & LoRA) :

  1. Pre-trained LLMs are used as starting policy for RL agent
  2. Observations from environments are converted to text
  3. Text observations triggers an action and subsequently updates the RL agent’s policy

Other Methods Implemented:

  1. Random
  2. Greedy
  3. DQN
  4. DDQN

About

Fine-tuning LLM agents w online RL for XiangQi (Chinese Chess)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published