Skip to content

Deep Reinforcement Learning algorithms for Policy Value methods written from scratch.

License

Notifications You must be signed in to change notification settings

QasimWani/policy-value-methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

policy-value-methods

My implementation on bunch of policy value methods from scratch

Algorithms:

  1. Hill Climb
  2. Cross Entropy Method
  3. Policy Gradient Methods
    1. REINFORCE
    2. PPO (Proximal Policy Optimization) Video
    3. Actor Critic

Results:

LunarLander (REINFORCE) {Solved in 519 episodes}

BipedalWalker-v3 (TD3) {completion time ~14seconds, achieved after 500 episodes}

Score

Rolling score