[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnugo-devel] MonteGNU
From: |
Gunnar Farnebäck |
Subject: |
[gnugo-devel] MonteGNU |
Date: |
Mon, 10 Apr 2006 00:39:26 +0200 |
User-agent: |
EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.3 Emacs/21.3 (sparc-sun-solaris2.9) MULE/5.0 (SAKAKI) |
Today's KGS tournament was 9x9 with very long thinking time, 28
minutes per player. As it's difficult to get GNU Go to spend all that
time in a meaningful and robust way I decided to make an experiment.
Attached is a patch to GNU Go 3.7.9, used in the tournament, which
adds a Monte-Carlo evaluation on top of the ordinary evaluation. More
precisely it first makes an ordinary move generation, then takes the
top moves, defined as valued at least one third of the best move, and
reevaluates those by an alpha-beta Monte-Carlo search. (Exception,
fuseki database moves are played immediately to save time.)
For exact details, see the patch. It should be mentioned, however,
that this is a quick (and somewhat dirty) hack and most likely very
crude, if not outright naive. Also be warned that it's only intended
for 9x9.
I'm unsure whether the patch actually makes the engine stronger but it
does change quite a few moves. I don't have a nice list of examples
but the comments in
http://kgs.kiseido.com/games/2006/4/9/CrazyStone-GNU.sgf give some
indications. I'm fairly sure Monte-Carlo techniques could be a
valuable addition to GNU Go, if employed carefully.
So how did the tournament go? GNU came in second after Viking5, having
lost twice against it. There were some exciting games, including the
CrazyStone game which looked rather easy to win until the level fell
down to 0 and the number of Monte-Carlo games happened to be reduced
to 0 (oops). As a result it blindly played the top-left-most of the
considered moves, which weren't the wisest choices (D8, E8, E9).
Running the 9x9.tst suite is interesting. It takes more time than the
full regressions usually do but the result is significant, 25 PASS vs
3 FAIL:
9x9:10 FAIL D5 [E5|E6|E3]
9x9:20 FAIL D5 [!D5]
9x9:80 PASS F8 [F8]
9x9:100 PASS F5 [F5]
9x9:120 PASS F1 [F1]
9x9:140 PASS D7 [D7|C7|E4]
9x9:160 PASS E7 [E7]
9x9:170 PASS D3 [D3|G7]
9x9:190 PASS F5 [F5]
9x9:210 PASS F8 [F8|E9]
9x9:265 PASS G6 [G6]
9x9:270 PASS F7 [B5|F7|F8|F3]
9x9:290 PASS F5 [F5]
9x9:320 PASS E5 [E5|F4|D4|E4]
9x9:330 PASS F3 [D3|F3]
9x9:400 PASS A5 [A5]
9x9:420 PASS D5 [D5]
9x9:440 PASS B7 [C6|B7]
9x9:450 PASS B4 [B5|B4]
9x9:460 PASS B9 [B9]
9x9:470 FAIL F6 [H3|H4|G4|D3]
9x9:480 PASS E2 [E2]
9x9:510 PASS C4 [C4]
9x9:560 PASS B4 [B4]
9x9:570 PASS E4 [E4]
9x9:580 PASS E6 [E6]
9x9:590 PASS E5 [E5]
9x9:600 PASS F3 [F4|F3|G3]
9x9 6948.19 37662962 155340
148691
Total nodes: 37662962 155340 148691
Total time: 6948.19 (6947.57)
Total uncertainty: 1.00
25 PASS
3 FAIL
/Gunnar
Index: engine/genmove.c
===================================================================
--- engine/genmove.c (revision 2340)
+++ engine/genmove.c (working copy)
@@ -30,6 +30,322 @@
#include "sgftree.h"
#include "gg_utils.h"
+#include "random.h"
+#include <math.h>
+#define number_of_simulated_games (250 * get_level())
+
+/* Generate a random move. */
+static int
+generate_random_move(int color)
+{
+ int moves[MAX_BOARD * MAX_BOARD];
+ int num_moves = 0;
+ int pos;
+ int k;
+ int last_move;
+ int last_color;
+ int offset;
+ int offset2;
+
+ /* If we get this deep we are almost certainly playing triple ko
+ * without alternative options, so just give up and score as is.
+ */
+ if (stackp > 200)
+ return PASS_MOVE;
+
+ /* Play reactive moves, aiming to reduce the Monte-Carlo tendency to fear
cuts. */
+ gg_assert(stackp >= 1);
+ get_move_from_stack(stackp - 1, &last_move, &last_color);
+ if (last_move != PASS_MOVE) {
+ /* If the opponent played self-atari, with high probability capture. */
+ if (countlib(last_move) == 1 && gg_drand() < 0.67 + 0.1 *
countstones(last_move)) {
+ int move;
+ findlib(last_move, 1, &move);
+ return move;
+ }
+
+ /* If the opponent played an atari, try to run away. */
+ offset = gg_urand() % 4;
+ for (k = 0; k < 4; k++) {
+ int neighbor = last_move + delta[(k + offset) % 4];
+ if (board[neighbor] == color
+ && countlib(neighbor) == 1
+ && gg_drand() < (0.3 + 0.25 * (countstones(neighbor) - 1))) {
+ int move;
+ findlib(neighbor, 1, &move);
+ if (!is_self_atari(move, color))
+ return move;
+ }
+ }
+
+ /* If the opponent might cut, connect or at least try to connect. */
+ for (k = 0; k < 4; k++) {
+ int down = delta[(k + offset) % 4];
+ int right = delta[(k + 1 + offset) % 4];
+ if (board[last_move + down] == EMPTY) {
+ int a = 0;
+ int b = 0;
+ if ((board[last_move + down + right] == color
+ || !ON_BOARD(last_move + down + right)))
+ a = 1;
+ else if (board[last_move + right] == color
+ || !ON_BOARD(last_move + right))
+ a = 2;
+ if (board[last_move + down - right] == color
+ || !ON_BOARD(last_move + down - right))
+ b = 1;
+ else if (board[last_move - right] == color
+ || !ON_BOARD(last_move - right))
+ b = 2;
+
+ if (a > 0 && b > 0 && a + b < 4
+ && gg_drand() < 0.6
+ && !is_self_atari(last_move + down, color))
+ return last_move + down;
+ }
+ }
+ }
+
+ /* Capturing stones is always fun. */
+ offset2 = gg_urand() % (BOARDMAX - BOARDMIN);
+ for (k = 0; k < BOARDMAX - BOARDMIN; k++) {
+ pos = BOARDMIN + (k + offset2) % (BOARDMAX - BOARDMIN);
+ if (board[pos] == OTHER_COLOR(color)
+ && find_origin(pos) == pos
+ && countlib(pos) == 1
+ && gg_drand() < 0.05 + 0.1 * (countstones(pos) - 1)) {
+ int move;
+ findlib(pos, 1, &move);
+ return move;
+ }
+ }
+
+ /* Brown move generation follows below. */
+ for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+ if (board[pos] == EMPTY)
+ moves[num_moves++] = pos;
+
+ while (num_moves > 0) {
+ int index = gg_rand() % num_moves;
+ pos = moves[index];
+ /* Consider moving at pos if it is legal and not suicide. */
+ if (is_legal(pos, color)
+ && !is_suicide(pos, color)) {
+ /* Further require the move not to be suicide for the opponent... */
+ if (!is_suicide(pos, OTHER_COLOR(color)))
+ return pos;
+ else {
+ /* ...however, if the move captures at least one stone,
+ * consider it anyway.
+ */
+ for (k = 0; k < 4; k++) {
+ int pos2 = pos + delta[k];
+ if (board[pos2] == OTHER_COLOR(color)) {
+ return pos;
+ }
+ }
+ }
+ }
+ moves[index] = moves[num_moves - 1];
+ num_moves--;
+ }
+
+ return PASS_MOVE;
+}
+
+static int play_random_game(void)
+{
+ int score = 0;
+ int move;
+ int pos;
+ int k;
+ int pass = 0;
+ int stackp_when_called = stackp;
+ int next_player = OTHER_COLOR(get_last_player());
+ if (next_player != WHITE && next_player != BLACK)
+ next_player = BLACK;
+
+ /* First finish the game, if it isn't already. */
+ while (pass < 3) {
+ move = generate_random_move(next_player);
+ if (is_pass(move))
+ pass++;
+ else
+ pass = 0;
+ trymove(move, next_player, NULL, NO_MOVE);
+ next_player = OTHER_COLOR(next_player);
+ }
+
+ for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+ if (ON_BOARD(pos)) {
+ int pos2 = pos;
+ if (board[pos] == EMPTY)
+ for (k = 0; k < 4; k++) {
+ pos2 = pos + delta[k];
+ if (IS_STONE(board[pos2]))
+ break;
+ }
+
+ score += 2 * ((countlib(pos2) > 1) ^ (board[pos2] == BLACK)) - 1;
+ }
+
+#if 0
+ simple_showboard(stderr);
+ gprintf("\nScore: %d\n\n", score);
+#endif
+
+ while (stackp > stackp_when_called)
+ popgo();
+
+ return score;
+}
+
+static int current_score(int color)
+{
+ int score = 0;
+ int score2 = 0;
+ int k;
+ double mean = 0.0;
+ double std = 0.0;
+
+ for (k = 0; k < number_of_simulated_games; k++) {
+ int this_score = play_random_game();
+ score += this_score;
+ score2 += this_score * this_score;
+ }
+
+ if (color == BLACK)
+ score = -score;
+
+ mean = score / (double) number_of_simulated_games;
+ std = sqrt((score2 - score * mean) / (number_of_simulated_games - 1));
+
+ if (color == WHITE)
+ mean += komi;
+ else
+ mean -= komi;
+
+ if (0) {
+ fprintf(stderr, "%d %7d %7.2f %7.2f %6d ", stackp, score, mean, std,
+ (int) (1000 * mean / (0.01 + std)));
+ dump_stack();
+ }
+
+ return (int) (1000 * mean / (0.01 + std));
+}
+
+static int
+alphabeta(int color, int *move, int alpha, int beta,
+ int *forbidden_move, int *allowed_moves, int depth_left)
+{
+ int result;
+ int pos;
+ if (move)
+ *move = PASS_MOVE;
+
+ if (depth_left == 0)
+ return current_score(color);
+
+ if (0 && current_score(color) >= beta) {
+ if (0) {
+ dump_stack();
+ gprintf("Early beta cut: %d\n", beta);
+ }
+ return beta;
+ }
+
+ for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+ /* Consider moving at pos if it is legal, not suicide, and not
+ * unconditional territory for either player.
+ */
+ if (board[pos] == EMPTY
+ && !forbidden_move[pos]
+ && (!allowed_moves || allowed_moves[pos])
+ && trymove(pos, color, NULL, NO_MOVE)) {
+ result = -alphabeta(OTHER_COLOR(color), NULL, -beta, -alpha,
+ forbidden_move, NULL, depth_left - 1);
+ popgo();
+
+ if (result >= beta) {
+ if (move)
+ *move = pos;
+ if (depth_left == 1) {
+ dump_stack();
+ gprintf("Beta cut: %d\n", beta);
+ }
+ return beta;
+ }
+
+ if (result > alpha) {
+ if (move)
+ *move = pos;
+ alpha = result;
+ }
+ }
+
+ if (depth_left == 1) {
+ dump_stack();
+ gprintf("Full return: %d\n", alpha);
+ }
+
+ return alpha;
+}
+
+static int
+monte_carlo_genmove(int color, int allowed_moves[BOARDMAX],
+ float *value, int *resign)
+{
+ int pos;
+ int best_score = -board_size * board_size * number_of_simulated_games;
+ int best_move = PASS_MOVE;
+ int unconditional_territory_black[BOARDMAX];
+ int unconditional_territory_white[BOARDMAX];
+ int forbidden_move[BOARDMAX];
+ int black_secure = 0;
+ int white_secure = 0;
+ int uncertain = 0;
+
+ if (0) {
+ simple_showboard(stderr);
+ gprintf("\n");
+ }
+
+ if (resign)
+ *resign = 0;
+
+ unconditional_life(unconditional_territory_black, BLACK);
+ unconditional_life(unconditional_territory_white, WHITE);
+
+ for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+ if (!ON_BOARD(pos))
+ continue;
+ else if (unconditional_territory_black[pos]) {
+ black_secure++;
+ forbidden_move[pos] = 1;
+ }
+ else if (unconditional_territory_white[pos]) {
+ white_secure++;
+ forbidden_move[pos] = 1;
+ }
+ else {
+ uncertain++;
+ forbidden_move[pos] = 0;
+ }
+
+ best_score = alphabeta(color, &best_move, -1000 * number_of_simulated_games,
1000 * number_of_simulated_games,
+ forbidden_move, allowed_moves, 2);
+
+#if 0
+ gprintf("Best move %1m: %d (%d), (%d-%d-%d)\n", best_move, best_score,
+ best_score / number_of_simulated_games,
+ white_secure, black_secure, uncertain);
+#endif
+ *value = best_score / (float) number_of_simulated_games;
+
+ return best_move;
+}
+
+
/* Return one if x doesn't equal position_number and 0 otherwise.
* After using this macro x will always have the value
* position_number.
@@ -469,6 +785,18 @@
NO_MOVE, 1.0);
}
+ if (move != PASS_MOVE && (*value < 75.0 || *value > 75.01) &&
!doing_scoring) {
+ int allowed_moves2[BOARDMAX];
+ int pos;
+ for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+ if (ON_BOARD(pos) && (!allowed_moves || allowed_moves[pos]) &&
potential_moves[pos] > *value / 3.0)
+ allowed_moves2[pos] = 1;
+ else
+ allowed_moves2[pos] = 0;
+
+ move = monte_carlo_genmove(color, allowed_moves2, value, resign);
+ }
+
/* If still no move, fill a remaining liberty. This should pick up
* all missing dame points.
*/
- [gnugo-devel] MonteGNU,
Gunnar Farnebäck <=