Re: The status of gnubg?

On Tue, 20 Oct 2020 at 18:02, Isaac Keslassy <isaac@ee.technion.ac.il> wrote:

Hi,

It would be great to renew the effort on gnubg!

I have a question regarding the fundamental NN weight improvement
technique. If I understand correctly, to improve the NN weights, you are
trying the supervised-learning approach of picking tough positions,
determining the best move using rollouts, then gradually optimizing the
NN weights. However, as Joseph mentioned, this may affect the NN play in
positions arising in regular games.

However, there are other techniques that have proved more efficient at
games like chess. They avoid the long rollouts and work on positions of
regular games. For instance:

1. SPSA: This is an obvious approach. Let the NN play against a very
slightly modified version of it, pick the winner, and using a random
walk, gradually converge to better parameters; or:

This will require a lot of cycles. Determining which of two closely related nets is better requires a large number of games. If you go that way, a good set of reference positions (obtained, as mentioned, from rollouts) would probably work better. Like all approaches this will need to be iterated (i.e. when you get a better player you want to re-roll the reference positions and repeat)

2. Logistic regression: Instead of teaching the best move, teach the
position equity (as also mentioned by Aaron).

We are training the net to compute the equity. The discussion was an attempt to explain how positions are added to the training data.

I recently trained nets for two other games with a similar method and this approach (of incrementally adding mis-played position) was again the best way of progressively getting a better player. I also had to start fresh a couple of times, each time having a slightly stronger base player. The big difference to gnubg was that I trained on 1-ply, not 2-ply. This seemed to eliminate some of the ply effect we see in gnubg and possibly other nets for other games.

Specifically, we could try
to minimize the equity error associated to each position. Assume DMP for
simplicity. Run a million games through self-play, and associate all the
obtained positions to the final game result (-1 for loss, +1 for win).
Then tune all the NN weights through gradient descent to minimize the
difference between the position estimate and the final game result.

(see https://www.chessprogramming.org/Automated_Tuning, Texel's tuning,
SPSA etc. for more details)

Has anybody tried such alternative methods?

Thanks,
Isaac

From:	Joseph Heled
Subject:	Re: The status of gnubg?
Date:	Tue, 20 Oct 2020 18:21:48 +1300