Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update on work in progress #72

Open
brianprichardson opened this issue Jul 31, 2018 · 3 comments
Open

Update on work in progress #72

brianprichardson opened this issue Jul 31, 2018 · 3 comments

Comments

@brianprichardson
Copy link
Collaborator

brianprichardson commented Jul 31, 2018

I am currently working on larger NNs (256x20).
Still using supervised PGN file input.
Have disabled "testeval".

Trying 32 policy and value head filters per Leela, and here:
https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a
Last "best NN" had 128x10 with 8 policy and 4 value head filters.

Also trying using learning rate finder
https://github.com/surmenok/keras_lr_finder
LR = 0.015 looks good.

Also trying 1cycle LR
https://medium.com/@nachiket.tanksale/finding-good-learning-rate-and-the-one-cycle-policy-7159fe1db5d6

Will also try 2 epochs per batch.

Things take time to run.

In the future will try skipping the first n moves of games, as I would run it with an opening book.

Likewise would like to try with tablebases.

@brianprichardson
Copy link
Collaborator Author

32 policy and value head filters looks fine.
LR finder and 1cycle also looking good.

Now trying 2 epochs per batch.
Takes about a week to tune and test for improvement.
I use cutechess-cli to test as "eval" command tends to result in repeated games.

@brianprichardson
Copy link
Collaborator Author

Larger 20x256 nets take considerably longer, so still crunching on training.
Also, the cutechess run to measure any improvement takes much longer to play the games.

@brianprichardson
Copy link
Collaborator Author

Wrong, wrong, wrong.

After the better part of 3 weeks training plus a week of testing, the new 256x20 net it is clearly at least 100 Elo worse than the prior best net (at fixed playouts, nevermind equal time). Moreover, I'm not even sure which net is the prior best (256x7, or 128x10, ???). Maybe the 32 head and policy filters are not a good thing. Maybe testeval should be left on. Still think I like LR finder and 1 cycle. Policy and value weights--who knows. Time to stop "ready, fire, aim".

So, taking a deep breath and a big step back to start again.
Instead of trying to go back to the point of the current best net (which I'd have to find first and could probably never duplicate), I'm considering starting small and simple with learning only 3 piece endgames, then 4, 5, 6. These nets could be small and fast, and hopefully won't waste a whole month of time. Thanks to @dkappe for the idea in Leela chess Discord.

This is what can happen when watching the Lc0 project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant