Hi,
A method that has been tried out goes something like this:
Step 1: Collect positions:
- Let the computer play self play, in many games.
- While playing, at each move, check if the 2-ply move selected move is the same as 0-ply selected move.
- if move_0ply != move_2ply -> store both resulting positions from the 0-ply move and the 2-ply move in some datastore (typically just a file).
- continue self play until you think you have collected enough positions. (What criteria that should be... boredom maybe?)
Step 2: Rollout.
- All positions collected in step 1, are then rolled out such that the best possible evaluation is found.
Step 3: Supervised training.
- All positions from the rollouts date above are then used for supervised training.
The new trained neural network you now got, is hopefully better than the one you had before you started this process? (However you MUST verify that in some way, and it is best if you have a verification method ready before you even start the training. If not you can verify that you have improved the network by having the new and the old network play against each other.)
And if you still think your neural network can be further improved, just start doing this again from Step 1.
OK. Some discussion:
The time consuming steps here are actually step 1 and step 2. Step 3, supervised training, is pretty fast with modern methods and hardware. Packages like Keras and PyTorch, (Chainer, Caffe, CNTK, Tensorflow or whatever) that can utilize GPU and TPU can train neural networks in minutes (instead of weeks). I already have tools to convert Keras and PyTorch neural nets to GNU Backgammon neural nets. (and the other way). So that is good news. However, more good news: the first two steps are highly distributable. Say we just make a simple tools chain and we start up 10-20 computers (Or maybe Ian has a lot of spare computers ;-), I guess the modern self play can find 2-3 0-ply 2ply mismatches pr. second (I'm just guessing?) to collect positions as described in step 1. We (or anyone volunteering) can start each of our collection processes on the equipment we got. Then if the same volunteers can rollout the positions with another tool (in the same toolchain) doing step 2. I then think we can get something going.
So, please join me in this discussion: Can we organize for such collective effort? I can share some tools. Joseph? Do you have some input? How many positions do you think we need? Will anyone join?
Thanks,
-Øystein