Skip to content

Using value iteration to find the optimum policy in a grid world environment.

Notifications You must be signed in to change notification settings

mbodenham/gridworld-value-iteration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Grid World Value Iteration

This project involves creating a grid world environment and applying value iteration to find the optimum policy. Below is the value iteration pseudocode that was programmed and tested (Reinforcement Learning, Sutton & Barto, 2018, pp. 83).

The state space of the grid world was represented using an array of length 25 (NxM) with the index system as shown below.

In the grid world it is possible to have two different items, fire and water. The rewards for collecting fire was set to -10 and the reward for collecting water set to 10. Both the fire and water are terminal states. Also, for each step performed by the agent a reward of -1 is received.

Deterministic Environment

It was first decided to implement a deterministic environment as it is easier to check is the solution is correct. The two images below show the Values and Policy derived for this environment, with orange representing fire and blue representing water.

Stochastic Environment

For the stochastic environment there is a probability of 0.7 that the agents moves as intended. This means there is a probability of 0.3 a move will be randomly selected. Below shows the Values and Policy derived for this stochastic environment.

About

Using value iteration to find the optimum policy in a grid world environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages