GardnerChessAi with Double Deep Q-Learning

In the training evaluation, a temperature of 0.1 was used. The strength of the AI, if it always plays the best move (with temperature 0), is of course higher. It can also be further increased by using a minimax search on top of the neural network evaluation (e.g. 'ai+minimax2' for a search of depth 2 on top of the neural net evaluation).

These three versions of the best model were manually pitted against minimax with the searching depths 2, 3 and 4. The win percentages of the AI are as follows (draw counts as half a win):

Opponent	AI with temperature 0.1	AI with temperature 0	AI+Minimax2
Minimax 2	52%	64%	87%
Minimax 3	37%	46%	59%
Minimax 4	12%	16%	41%

As one can see, the AI is almost as good as minimax with a searching depth of 3, when it always plays the best move (temperature 0) without further search.

When using a minimax search of depth 2 on top of the neural network evaluation, the strength of the AI increases to a point which is exactly between minimax with the searching depths 3 and 4.

Example Game Against Minimax

Here is an example game of the pretrained model (white) playing against minimax with a searching depth of 3 (black).

Personal Experience

Motivation & Goal

This project was a hobby during my last year of school in 2022/2023. My goal was to train an AI through self-play that beats my family members in chess. Looking back, it was a lot of fun and I learned a lot about Q-Learning including ways to improve the vanilla Q-Learning algorithm and the choice of hyperparameters.

Now about a year later, I migrated to the newest tensorflow version, retrained a model with a gpu (before, I had only used a cpu) and made this project public.

Problems

instability
exploding Q-Values
slow training
my time-consuming progress monitoring addiction

Solutions

instability
- long enough exploration phase helped
- checkpoints
- experience replay buffer is a must-have; I also implemented a prioritized experience replay buffer but it slowed down the training (for me, it wasn't worth it)
- keeping Q-Values small
exploding Q-Values
- Double Deep Q-Learning instead of vanilla Q-Learning
- continuously updating target model by a very small percentage instead of copying the weights every few epochs
- don't have the discount factor unnecessary high
- patience: if the Q-Values don't explode too much, they often stabilise at some point
slow training
- exponential decaying learning rate
- gpu training instead of cpu-only training
- time different parts of the training process and optimize the most time-consuming parts. For me, this was:
  - directly calling model() instead of model.predict() to get the Q-Values extremely sped up training and interference (in get_q_values() methode in neural_network.py)
  - minimizing model() calls by batching inputs in the fit_on_memory() methode in training.py
- with these optimizations, I was able to decrease the epoch time from 2 minutes to 7 seconds while at the same time increasing the batch size from 32 to 128 and increasing the fitting frequency from 8 to 16
my time-consuming progress monitoring addiction
- partially solved by a fully automated training and evaluation process which includes saving, remembering and reloading training settings, making checkpoints, pitting the agent against different opponents, updating the training graph

Installation and Usage

How to install

clone the repository
install the dependencies (if you have conda and want to use a gpu (only possible on wsl2/linux), you can use the gardnerChessAi.yml file with the terminal command conda env create -f gardnerChessAi.yml to create a conda environment with all the dependencies)
if an error arises during the loading of the pretrained model, it can be resolved by manually downloading and replacing the saves\pretrained\gardnerChessAi_model_main_checkpoint\keras_metadata.pb file. This issue is due to a known Git bug and is beyond my control.

Packages with Versions

python=3.11.5
tensorflow=2.15.0
numpy=1.26.2
matplotlib=3.8.3
pygame=2.5.2

How to use

run training.py to train a model (you can train you own model or continue training the pretrained model)
training evaluation can be followed in matplotlib plots under saves/modelName/gardnerChessAi_training_graph
run play.py to play against a model or watch two models play against each other
run spectate.py to see how the agent improved the play style over the epochs
in the scripts are more detailed explanations and options to choose from

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
__pycache__		__pycache__
rsc		rsc
saves/pretrained		saves/pretrained
LICENSE		LICENSE
README.md		README.md
board.py		board.py
game.py		game.py
gardnerChessAi.yml		gardnerChessAi.yml
graph.py		graph.py
neural_network.py		neural_network.py
play.py		play.py
replay_buffer.py		replay_buffer.py
spectate.py		spectate.py
training.py		training.py
window.py		window.py

License

flowun/gardnerChessAi

Folders and files

Latest commit

History

Repository files navigation

GardnerChessAi with Double Deep Q-Learning

Table of Contents

Game Rules

Description

How does the AI evaluate positions?

Training Process

Evaluation

Example Game Against Minimax

Personal Experience

Motivation & Goal

Problems

Solutions

Installation and Usage

How to install

Packages with Versions

How to use

About

Topics

Resources

License

Stars

Watchers

Forks

Languages