[gnugo-devel] MonteGNU

gnugo-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnugo-devel] MonteGNU

From:	Gunnar Farneb�ck
Subject:	[gnugo-devel] MonteGNU
Date:	Mon, 10 Apr 2006 00:39:26 +0200
User-agent:	EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.3 (sparc-sun-solaris2.9) MULE/5.0 (SAKAKI)
Today's KGS tournament was 9x9 with very long thinking time, 28
minutes per player. As it's difficult to get GNU Go to spend all that
time in a meaningful and robust way I decided to make an experiment.

Attached is a patch to GNU Go 3.7.9, used in the tournament, which
adds a Monte-Carlo evaluation on top of the ordinary evaluation. More
precisely it first makes an ordinary move generation, then takes the
top moves, defined as valued at least one third of the best move, and
reevaluates those by an alpha-beta Monte-Carlo search. (Exception,
fuseki database moves are played immediately to save time.)

For exact details, see the patch. It should be mentioned, however,
that this is a quick (and somewhat dirty) hack and most likely very
crude, if not outright naive. Also be warned that it's only intended
for 9x9.

I'm unsure whether the patch actually makes the engine stronger but it
does change quite a few moves. I don't have a nice list of examples
but the comments in
http://kgs.kiseido.com/games/2006/4/9/CrazyStone-GNU.sgf give some
indications. I'm fairly sure Monte-Carlo techniques could be a
valuable addition to GNU Go, if employed carefully.

So how did the tournament go? GNU came in second after Viking5, having
lost twice against it. There were some exciting games, including the
CrazyStone game which looked rather easy to win until the level fell
down to 0 and the number of Monte-Carlo games happened to be reduced
to 0 (oops). As a result it blindly played the top-left-most of the
considered moves, which weren't the wisest choices (D8, E8, E9).

Running the 9x9.tst suite is interesting. It takes more time than the
full regressions usually do but the result is significant, 25 PASS vs
3 FAIL:

9x9:10          FAIL D5 [E5|E6|E3]
9x9:20          FAIL D5 [!D5]
9x9:80          PASS F8 [F8]
9x9:100         PASS F5 [F5]
9x9:120         PASS F1 [F1]
9x9:140         PASS D7 [D7|C7|E4]
9x9:160         PASS E7 [E7]
9x9:170         PASS D3 [D3|G7]
9x9:190         PASS F5 [F5]
9x9:210         PASS F8 [F8|E9]
9x9:265         PASS G6 [G6]
9x9:270         PASS F7 [B5|F7|F8|F3]
9x9:290         PASS F5 [F5]
9x9:320         PASS E5 [E5|F4|D4|E4]
9x9:330         PASS F3 [D3|F3]
9x9:400         PASS A5 [A5]
9x9:420         PASS D5 [D5]
9x9:440         PASS B7 [C6|B7]
9x9:450         PASS B4 [B5|B4]
9x9:460         PASS B9 [B9]
9x9:470         FAIL F6 [H3|H4|G4|D3]
9x9:480         PASS E2 [E2]
9x9:510         PASS C4 [C4]
9x9:560         PASS B4 [B4]
9x9:570         PASS E4 [E4]
9x9:580         PASS E6 [E6]
9x9:590         PASS E5 [E5]
9x9:600         PASS F3 [F4|F3|G3]
9x9                                   6948.19  37662962  155340
148691
Total nodes: 37662962 155340 148691
Total time: 6948.19 (6947.57)
Total uncertainty: 1.00
25 PASS
3 FAIL

/Gunnar

Index: engine/genmove.c
===================================================================
--- engine/genmove.c    (revision 2340)
+++ engine/genmove.c    (working copy)
@@ -30,6 +30,322 @@
 #include "sgftree.h"
 #include "gg_utils.h"
 
+#include "random.h"
+#include <math.h>
+#define number_of_simulated_games (250 * get_level())
+
+/* Generate a random move. */
+static int
+generate_random_move(int color)
+{
+  int moves[MAX_BOARD * MAX_BOARD];
+  int num_moves = 0;
+  int pos;
+  int k;
+  int last_move;
+  int last_color;
+  int offset;
+  int offset2;
+
+  /* If we get this deep we are almost certainly playing triple ko
+   * without alternative options, so just give up and score as is.
+   */
+  if (stackp > 200)
+    return PASS_MOVE;
+
+  /* Play reactive moves, aiming to reduce the Monte-Carlo tendency to fear 
cuts. */
+  gg_assert(stackp >= 1);
+  get_move_from_stack(stackp - 1, &last_move, &last_color);
+  if (last_move != PASS_MOVE) {
+    /* If the opponent played self-atari, with high probability capture. */
+    if (countlib(last_move) == 1 && gg_drand() < 0.67 + 0.1 * 
countstones(last_move)) {
+      int move;
+      findlib(last_move, 1, &move);
+      return move;
+    }
+    
+    /* If the opponent played an atari, try to run away. */
+    offset = gg_urand() % 4;
+    for (k = 0; k < 4; k++) {
+      int neighbor = last_move + delta[(k + offset) % 4];
+      if (board[neighbor] == color
+         && countlib(neighbor) == 1
+         && gg_drand() < (0.3 + 0.25 * (countstones(neighbor) - 1))) {
+       int move;
+       findlib(neighbor, 1, &move);
+       if (!is_self_atari(move, color))
+         return move;
+      }
+    }
+    
+    /* If the opponent might cut, connect or at least try to connect. */
+    for (k = 0; k < 4; k++) {
+      int down = delta[(k + offset) % 4];
+      int right = delta[(k + 1 + offset) % 4];
+      if (board[last_move + down] == EMPTY) {
+       int a = 0;
+       int b = 0;
+       if ((board[last_move + down + right] == color
+            || !ON_BOARD(last_move + down + right)))
+         a = 1;
+       else if (board[last_move + right] == color
+                || !ON_BOARD(last_move + right))
+         a = 2;
+       if (board[last_move + down - right] == color
+           || !ON_BOARD(last_move + down - right))
+         b = 1;
+       else if (board[last_move - right] == color
+                || !ON_BOARD(last_move - right))
+         b = 2;
+
+       if (a > 0 && b > 0 && a + b < 4
+           && gg_drand() < 0.6
+           && !is_self_atari(last_move + down, color))
+         return last_move + down;
+      }
+    }
+  }
+
+  /* Capturing stones is always fun. */
+  offset2 = gg_urand() % (BOARDMAX - BOARDMIN);
+  for (k = 0; k < BOARDMAX - BOARDMIN; k++) {
+    pos = BOARDMIN + (k + offset2) % (BOARDMAX - BOARDMIN);
+    if (board[pos] == OTHER_COLOR(color)
+       && find_origin(pos) == pos
+       && countlib(pos) == 1
+       && gg_drand() < 0.05 + 0.1 * (countstones(pos) - 1)) {
+      int move;
+      findlib(pos, 1, &move);
+      return move;
+    }
+  }
+
+  /* Brown move generation follows below. */
+  for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+    if (board[pos] == EMPTY)
+      moves[num_moves++] = pos;
+
+  while (num_moves > 0) {
+    int index = gg_rand() % num_moves;
+    pos = moves[index];
+    /* Consider moving at pos if it is legal and not suicide. */
+    if (is_legal(pos, color)
+       && !is_suicide(pos, color)) {
+      /* Further require the move not to be suicide for the opponent... */
+      if (!is_suicide(pos, OTHER_COLOR(color)))
+       return pos;
+      else {
+       /* ...however, if the move captures at least one stone,
+        * consider it anyway.
+        */
+       for (k = 0; k < 4; k++) {
+         int pos2 = pos + delta[k];
+         if (board[pos2] == OTHER_COLOR(color)) {
+           return pos;
+         }
+       }
+      }
+    }
+    moves[index] = moves[num_moves - 1];
+    num_moves--;
+  }
+
+  return PASS_MOVE;
+}
+
+static int play_random_game(void)
+{
+  int score = 0;
+  int move;
+  int pos;
+  int k;
+  int pass = 0;
+  int stackp_when_called = stackp;
+  int next_player = OTHER_COLOR(get_last_player());
+  if (next_player != WHITE && next_player != BLACK)
+    next_player = BLACK;
+
+  /* First finish the game, if it isn't already. */
+  while (pass < 3) {
+    move = generate_random_move(next_player);
+    if (is_pass(move))
+      pass++;
+    else
+      pass = 0;
+    trymove(move, next_player, NULL, NO_MOVE);
+    next_player = OTHER_COLOR(next_player);
+  }
+  
+  for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+    if (ON_BOARD(pos)) {
+      int pos2 = pos;
+      if (board[pos] == EMPTY)
+       for (k = 0; k < 4; k++) {
+         pos2 = pos + delta[k];
+         if (IS_STONE(board[pos2]))
+           break;
+       }
+
+      score += 2 * ((countlib(pos2) > 1) ^ (board[pos2] == BLACK)) - 1;
+    }
+
+#if 0
+  simple_showboard(stderr);
+  gprintf("\nScore: %d\n\n", score);
+#endif
+  
+  while (stackp > stackp_when_called)
+    popgo();
+
+  return score;
+}
+
+static int current_score(int color)
+{
+  int score = 0;
+  int score2 = 0;
+  int k;
+  double mean = 0.0;
+  double std = 0.0;
+  
+  for (k = 0; k < number_of_simulated_games; k++) {
+    int this_score = play_random_game();
+    score += this_score;
+    score2 += this_score * this_score;
+  }
+  
+  if (color == BLACK)
+    score = -score;
+
+  mean = score / (double) number_of_simulated_games;
+  std = sqrt((score2 - score * mean) / (number_of_simulated_games - 1));
+
+  if (color == WHITE)
+    mean += komi;
+  else
+    mean -= komi;
+  
+  if (0) {
+    fprintf(stderr, "%d %7d %7.2f %7.2f %6d ", stackp, score, mean, std,
+           (int) (1000 * mean / (0.01 + std)));
+    dump_stack();
+  }
+
+  return (int) (1000 * mean / (0.01 + std));
+}
+
+static int
+alphabeta(int color, int *move, int alpha, int beta,
+         int *forbidden_move, int *allowed_moves, int depth_left)
+{
+  int result;
+  int pos;
+  if (move)
+    *move = PASS_MOVE;
+
+  if (depth_left == 0)
+    return current_score(color);
+  
+  if (0 && current_score(color) >= beta) {
+    if (0) {
+      dump_stack();
+      gprintf("Early beta cut: %d\n", beta);
+    }
+    return beta;
+  }
+
+  for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+    /* Consider moving at pos if it is legal, not suicide, and not
+     * unconditional territory for either player.
+     */
+    if (board[pos] == EMPTY
+       && !forbidden_move[pos]
+       && (!allowed_moves || allowed_moves[pos])
+       && trymove(pos, color, NULL, NO_MOVE)) {
+      result = -alphabeta(OTHER_COLOR(color), NULL, -beta, -alpha,
+                         forbidden_move, NULL, depth_left - 1);
+      popgo();
+    
+      if (result >= beta) {
+       if (move)
+         *move = pos;
+       if (depth_left == 1) {
+         dump_stack();
+         gprintf("Beta cut: %d\n", beta);
+       }
+       return beta;
+      }
+    
+      if (result > alpha) {
+       if (move)
+         *move = pos;
+       alpha = result;
+      }
+    }
+
+  if (depth_left == 1) {
+    dump_stack();
+    gprintf("Full return: %d\n", alpha);
+  }
+  
+  return alpha;
+}
+
+static int
+monte_carlo_genmove(int color, int allowed_moves[BOARDMAX],
+                   float *value, int *resign)
+{
+  int pos;
+  int best_score = -board_size * board_size * number_of_simulated_games;
+  int best_move = PASS_MOVE;
+  int unconditional_territory_black[BOARDMAX];
+  int unconditional_territory_white[BOARDMAX];
+  int forbidden_move[BOARDMAX];
+  int black_secure = 0;
+  int white_secure = 0;
+  int uncertain = 0;
+
+  if (0) {
+    simple_showboard(stderr);
+    gprintf("\n");
+  }
+
+  if (resign)
+    *resign = 0;
+  
+  unconditional_life(unconditional_territory_black, BLACK);
+  unconditional_life(unconditional_territory_white, WHITE);
+
+  for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+    if (!ON_BOARD(pos))
+      continue;
+    else if (unconditional_territory_black[pos]) {
+      black_secure++;
+      forbidden_move[pos] = 1;
+    }
+    else if (unconditional_territory_white[pos]) {
+      white_secure++;
+      forbidden_move[pos] = 1;
+    }
+    else {
+      uncertain++;
+      forbidden_move[pos] = 0;
+    }
+  
+  best_score = alphabeta(color, &best_move, -1000 * number_of_simulated_games, 
1000 * number_of_simulated_games,
+                        forbidden_move, allowed_moves, 2);
+  
+#if 0
+  gprintf("Best move %1m: %d (%d), (%d-%d-%d)\n", best_move, best_score,
+         best_score / number_of_simulated_games,
+         white_secure, black_secure, uncertain);
+#endif
+  *value = best_score / (float) number_of_simulated_games;
+  
+  return best_move;
+}
+
+
 /* Return one if x doesn't equal position_number and 0 otherwise.
  * After using this macro x will always have the value
  * position_number.
@@ -469,6 +785,18 @@
                NO_MOVE, 1.0);
   }
 
+  if (move != PASS_MOVE && (*value < 75.0 || *value > 75.01) && 
!doing_scoring) {
+    int allowed_moves2[BOARDMAX];
+    int pos;
+    for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+      if (ON_BOARD(pos) && (!allowed_moves || allowed_moves[pos]) && 
potential_moves[pos] > *value / 3.0)
+       allowed_moves2[pos] = 1;
+      else
+       allowed_moves2[pos] = 0;
+    
+    move = monte_carlo_genmove(color, allowed_moves2, value, resign);
+  }
+  
   /* If still no move, fill a remaining liberty. This should pick up
    * all missing dame points.
    */
[Prev in Thread]
Current Thread
[Next in Thread]
[gnugo-devel] MonteGNU, Gunnar Farneb�ck <=
- Re: [gnugo-devel] MonteGNU, alain Baeckeroot, 2006/04/10
- Re: [gnugo-devel] MonteGNU, Pierce T. Wetter III, 2006/04/21
- Re: [gnugo-devel] MonteGNU, alain Baeckeroot, 2006/04/25
- Re: [gnugo-devel] MonteGNU. Stat questions, alain Baeckeroot, 2006/04/28
  - Re: [gnugo-devel] MonteGNU. Stat questions, Gunnar Farneb�ck, 2006/04/28
Prev by Date: [gnugo-devel] Two trivial configure/make bugs
Next by Date: Re: [gnugo-devel] suicide move
Previous by thread: [gnugo-devel] Two trivial configure/make bugs
Next by thread: Re: [gnugo-devel] MonteGNU
Index(es):
- Date
- Thread