[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnugo-devel] arend_1_28.1: probabilistic hashing to save memory
From: |
Arend Bayer |
Subject: |
[gnugo-devel] arend_1_28.1: probabilistic hashing to save memory |
Date: |
Wed, 6 Mar 2002 01:17:28 -0500 (EST) |
- hash nodes store 2nd hash value instead of full board position
Dan had suggested (in reply to a private e-mail) that I send a patch if
I think the proposed change of the hashing scheme should happen before 3.2.
So here it is. Instead of a hash value plus a complete bitmap representation
of the board, a hash node now consists of two hash values. If both hash
values of the actual and a stored position agree, it is almost guaraneed
that they are identical. This will result in less than one incorrect read
result cache table lookup in 1 million games.
This hashing scheme is about 4-5 times as memory efficient as the old one
(the factor 2 I had claimed in my last e-mail is non-sense, I had assumed
that the board representation needs 361 bits, but of course it needs
2*361).
As for testing, I did the following:
I replayed five NNGS-games and 9 of Dan's two stones series, both with
old hashing, and with the new hashing with different cache sizes. The
-t trace output did not give any differences. The statistics from -S
expectedly gave small differences. With a cache size of appr. 3.5 M (resulting
in a hash table that should be able to store the same number of nodes as the
old 16 M cache) the results were almost identical; the timings hardly
differed either (differences in the order of 0.1%).
With 8 M cache size/new hashing instead, I got a speed improvement of about
5% compared to 16M/old hashing (this appr. agreed with the number of saved
reading nodes). I have no reliable figures how much more speed improvement
one gets by enlarging the cache to 16 M.
Also, I had one game replayed with CHECK_HASHING turned on, nothing
unexpected happening.
The main disadvantage I see for this patch is that debugging hash tables will
get more difficult. It is no longer possible to dump the board belonging to
a stored hash node.
A potential source for bugs is that hashdata_set_ko and hashdata_remove_ko
now have to be replaced by a single function hashdata_invert_ko; so the
calling function has to remember itself whether (and where) a ko needs to
get removed from the current hash data. However, such bugs should immediately
get discoverd by turning CHECK_HASHING on.
If we want to postpone this patch until after 3.2, that should not be a
problem. I would expect that the patch could get reused without changes.
Arend
Index: engine/board.c
===================================================================
RCS file: /cvsroot/gnugo/gnugo/engine/board.c,v
retrieving revision 1.37
diff -u -r1.37 board.c
--- engine/board.c 4 Mar 2002 06:49:08 -0000 1.37
+++ engine/board.c 6 Mar 2002 05:00:46 -0000
@@ -445,25 +445,25 @@
if (str == 0) {
if (!komaster_is_empty(komaster, kom_pos))
gg_snprintf(buf, 100, "%s (variation %d, hash %lx, komaster %s:%s)",
- message, count_variations, hashdata.hashval,
+ message, count_variations, hashdata.hashval[0],
komaster_to_string(komaster),
location_to_string(kom_pos));
else
gg_snprintf(buf, 100, "%s (variation %d, hash %lx)",
- message, count_variations, hashdata.hashval);
+ message, count_variations, hashdata.hashval[0]);
}
else {
if (!komaster_is_empty(komaster, kom_pos))
gg_snprintf(buf, 100,
"%s at %s (variation %d, hash %lx, komaster %s:%s)",
message, location_to_string(str), count_variations,
- hashdata.hashval,
+ hashdata.hashval[0],
komaster_to_string(komaster),
location_to_string(kom_pos));
else
gg_snprintf(buf, 100, "%s at %s (variation %d, hash %lx)",
message, location_to_string(str), count_variations,
- hashdata.hashval);
+ hashdata.hashval[0]);
}
sgftreeAddPlayLast(sgf_dumptree, NULL, color, I(pos), J(pos));
sgftreeAddComment(sgf_dumptree, NULL, buf);
@@ -546,12 +546,12 @@
message = "UNKNOWN";
if (!komaster_is_empty(komaster, kom_pos))
gg_snprintf(buf, 100, "tryko: %s (variation %d, %lx, komaster %s:%s)",
- message, count_variations, hashdata.hashval,
+ message, count_variations, hashdata.hashval[0],
komaster_to_string(komaster),
location_to_string(kom_pos));
else
gg_snprintf(buf, 100, "tryko: %s (variation %d, %lx)",
- message, count_variations, hashdata.hashval);
+ message, count_variations, hashdata.hashval[0]);
if (0) {
/* tm - I don't find these pass moves helpful in the tree. */
sgftreeAddPlayLast(sgf_dumptree, NULL, color, -1, -1);
@@ -662,8 +662,9 @@
PUSH_VALUE(board_ko_pos);
memcpy(&hashdata_stack[stackp], &hashdata, sizeof(hashdata));
+ if (board_ko_pos)
+ hashdata_invert_ko(&hashdata, board_ko_pos);
board_ko_pos = 0;
- hashdata_remove_ko(&hashdata);
PUSH_VALUE(black_captured);
PUSH_VALUE(white_captured);
@@ -764,7 +765,7 @@
if (count_variations)
gprintf("%o (variation %d)", count_variations-1);
#else
- gprintf("%o (%d)", hashdata.hashval);
+ gprintf("%o (%d)", hashdata.hashval[0]);
#endif
gprintf("%o\n");
@@ -847,7 +848,7 @@
/* Check the hash table to see if it corresponds to the cumulative one. */
hashdata_recalc(&oldkey, board, board_ko_pos);
- gg_assert(hashdata_diff_dump(&oldkey, &hashdata) == 0);
+ gg_assert(hashdata_compare(&oldkey, &hashdata) == 0);
#endif
gg_assert(stackp == 0);
@@ -855,8 +856,9 @@
last_moves[1] = last_moves[0];
last_moves[0] = pos;
+ if (board_ko_pos)
+ hashdata_invert_ko(&hashdata, board_ko_pos);
board_ko_pos = 0;
- hashdata_remove_ko(&hashdata);
/* If the move is a pass, we can skip some steps. */
if (pos != 0) {
@@ -869,7 +871,7 @@
#if CHECK_HASHING
/* Check the hash table to see if it equals the previous one. */
hashdata_recalc(&oldkey, board, board_ko_pos);
- gg_assert(hashdata_diff_dump(&oldkey, &hashdata) == 0);
+ gg_assert(hashdata_compare(&oldkey, &hashdata) == 0);
#endif
}
@@ -3397,8 +3399,11 @@
if (string[s].liberties == 1
&& string[s].size == 1
&& captured_stones == 1) {
+ /* In the case of a double ko: have to clear the old ko position first. */
+ if (board_ko_pos)
+ hashdata_invert_ko(&hashdata, board_ko_pos);
board_ko_pos = string[s].libs[0];
- hashdata_set_ko(&hashdata, board_ko_pos);
+ hashdata_invert_ko(&hashdata, board_ko_pos);
}
}
Index: engine/cache.c
===================================================================
RCS file: /cvsroot/gnugo/gnugo/engine/cache.c,v
retrieving revision 1.11
diff -u -r1.11 cache.c
--- engine/cache.c 4 Mar 2002 06:49:08 -0000 1.11
+++ engine/cache.c 6 Mar 2002 05:00:50 -0000
@@ -93,7 +93,6 @@
/* Data about the node itself. */
fprintf(outfile, "Hash value: %lx\n", (unsigned long) node->key.hashval);
- hashposition_dump(&(node->key.hashpos), outfile);
for (result = node->results; result != NULL; result = result->next) {
read_result_dump(result, outfile);
@@ -367,7 +366,7 @@
*/
for (k = 0; k < table->num_nodes; k++) {
node = &(table->all_nodes[k]);
- bucket = node->key.hashval % table->hashtablesize;
+ bucket = node->key.hashval[0] % table->hashtablesize;
previous = NULL;
current = table->hashtable[bucket];
@@ -469,7 +468,7 @@
node->results = NULL;
/* ...and enter it into the table. */
- bucket = hd->hashval % table->hashtablesize;
+ bucket = hd->hashval[0] % table->hashtablesize;
node->next = table->hashtable[bucket];
table->hashtable[bucket] = node;
@@ -494,11 +493,13 @@
Hashnode *node;
int bucket;
- bucket = hd->hashval % table->hashtablesize;
+ bucket = hd->hashval[0] % table->hashtablesize;
for (node = table->hashtable[bucket]; node != NULL; node = node->next) {
- if (node->key.hashval != hd->hashval)
+ if (node->key.hashval[0] != hd->hashval[0])
continue;
- if (hashposition_compare(&hd->hashpos, &node->key.hashpos) == 0)
+ if (node->key.hashval[1] != hd->hashval[1])
+ stats.hash_collisions++;
+ else
break;
}
return node;
@@ -576,6 +577,7 @@
* allocate a single node or if the allocation fails, the caching is
* disabled.
*/
+
void
reading_cache_init(int bytes)
{
@@ -590,6 +592,8 @@
+ sizeof(Hashnode)
+ 1.4 * sizeof(Read_result)));
/* If we get a zero size hash table, disable hashing completely. */
+ if (0)
+ gprintf("Allocated memory for %d hash nodes.\n", (int) nodes);
if (nodes < 1.0)
hashflags = HASH_NOTHING;
movehash = hashtable_new((int) (1.5 * nodes), /* table size */
@@ -691,7 +695,7 @@
/* Check the hash table to see if we have had this position before. */
hashdata_recalc(&key, board, board_ko_pos);
- gg_assert(hashdata_diff_dump(&key, &hashdata) == 0);
+ gg_assert(hashdata_compare(&key, &hashdata) == 0);
/* Find this position in the table. If it wasn't found, enter it. */
hashnode = hashtable_search(movehash, &hashdata);
Index: engine/hash.c
===================================================================
RCS file: /cvsroot/gnugo/gnugo/engine/hash.c,v
retrieving revision 1.10
diff -u -r1.10 hash.c
--- engine/hash.c 4 Mar 2002 06:49:08 -0000 1.10
+++ engine/hash.c 6 Mar 2002 05:00:50 -0000
@@ -43,13 +43,9 @@
/* Random values for the hash function. For stones and ko position. */
-static Hashvalue white_hash[BOARDMAX];
-static Hashvalue black_hash[BOARDMAX];
-static Hashvalue ko_hash[BOARDMAX];
-
-
-static Compacttype white_patterns[4 * sizeof(Compacttype)];
-static Compacttype black_patterns[4 * sizeof(Compacttype)];
+static Hashvalue white_hash[BOARDMAX][2];
+static Hashvalue black_hash[BOARDMAX][2];
+static Hashvalue ko_hash[BOARDMAX][2];
/* Get a random Hashvalue, where all bits are used. */
@@ -74,7 +70,7 @@
hash_init(void)
{
int pos;
- int x;
+ int i;
struct gg_rand_state state;
if (is_initialized)
@@ -91,24 +87,16 @@
gg_srand(1);
#endif
- for (pos = BOARDMIN; pos < BOARDMAX; pos++)
- if (ON_BOARD(pos)) {
- black_hash[pos] = hash_rand();
- white_hash[pos] = hash_rand();
- ko_hash[pos] = hash_rand();
- }
-
+ for (i = 0; i < 2; i++)
+ for (pos = BOARDMIN; pos < BOARDMAX; pos++)
+ if (ON_BOARD(pos)) {
+ black_hash[pos][i] = hash_rand();
+ white_hash[pos][i] = hash_rand();
+ ko_hash[pos][i] = hash_rand();
+ }
+
gg_set_rand_state(&state);
- {
- Compacttype mask;
-
- for (x = 0, mask = 1; mask; x++, mask <<= 2) {
- white_patterns[x] = mask;
- black_patterns[x] = mask << 1;
- }
- }
-
is_initialized = 1;
}
@@ -116,59 +104,7 @@
/* ---------------------------------------------------------------- */
-/* Return 0 if *pos1 == *pos2, otherwise return 1.
- * This adheres (almost) to the standard compare function semantics
- * which are used e.g. by the comparison functions used in qsort().
- */
-
-int
-hashposition_compare(Hashposition *pos1, Hashposition *pos2)
-{
- int i;
-
- /* We need only compare to board_size. MAX_BOARD is not necessary. */
- for (i = 0; i < (int) (board_size * board_size / POINTSPERCOMPACT + 1); i++)
- if (pos1->board[i] != pos2->board[i]) {
- stats.hash_collisions++;
- return 1;
- }
-
- if (pos1->ko_pos != pos2->ko_pos) {
- stats.hash_collisions++;
- return 1;
- }
-
- return 0;
-}
-
-
-/*
- * Dump an ASCII representation of the contents of a Hashposition onto
- * the FILE outfile.
- */
-
-void
-hashposition_dump(Hashposition *pos, FILE *outfile)
-{
- int i;
-
- gfprintf(outfile, "Board: ");
- for (i = 0; i < (int) COMPACT_BOARD_SIZE; ++i)
- gfprintf(outfile, " %lx", (unsigned long) pos->board[i]);
-
- if (pos->ko_pos == 0)
- gfprintf(outfile, " No ko");
- else
- gfprintf(outfile, " Ko position: %1m", pos->ko_pos);
-}
-
-
-/* ---------------------------------------------------------------- */
-
-
-/* Calculate the compactboard and the hashvalue in one function.
- * They are always used together and it saves us a loop and a function
- * call.
+/* Calculate the two hashvalues.
*/
void
@@ -187,123 +123,49 @@
* SPARC ???
*/
-#define USE_SHIFTING 1
-#if USE_SHIFTING
- unsigned int index;
int pos;
- Compacttype bits;
- target->hashval = 0;
- bits = 1;
- index = 0;
- target->hashpos.board[index] = 0;
+ target->hashval[0] = 0;
+ target->hashval[1] = 0;
for (pos = BOARDMIN; pos < BOARDMAX; pos++) {
if (!ON_BOARD(pos))
continue;
switch (p[pos]) {
default:
case EMPTY:
- bits <<= 2;
break;
case WHITE:
- target->hashval ^= white_hash[pos];
- target->hashpos.board[index] |= bits;
- bits <<= 2;
+ target->hashval[0] ^= white_hash[pos][0];
+ target->hashval[1] ^= white_hash[pos][1];
break;
case BLACK:
- target->hashval ^= black_hash[pos];
- bits <<= 1;
- target->hashpos.board[index] |= bits;
- bits <<= 1;
+ target->hashval[0] ^= black_hash[pos][0];
+ target->hashval[1] ^= black_hash[pos][1];
break;
}
-
- if (!bits) {
- /* This means the bit fell off the left side. */
- bits = 1;
- index++;
- if (index < COMPACT_BOARD_SIZE)
- target->hashpos.board[index] = 0;
- }
}
- /* This cleans up garbage bits at the (unused) end of the array.
- * It probably should not really be necessary.
- */
- while (++index < COMPACT_BOARD_SIZE)
- target->hashpos.board[index] = 0;
-
-#else /* USE_SHIFTING */
-
- int index;
- int subindex;
- int pos;
- Compacttype bits;
-
- target->hashval = 0;
- index = 0;
- subindex = 0;
- bits = 0;
- for (pos = BOARDMIN; pos < BOARDMAX; pos++) {
- if (!ON_BOARD(pos))
- continue;
- switch (p[pos]) {
- default:
- case EMPTY:
- break;
- case WHITE:
- target->hashval ^= white_hash[pos];
- bits |= white_patterns[subindex];
- break;
- case BLACK:
- target->hashval ^= black_hash[pos];
- bits |= black_patterns[subindex];
- break;
- }
- if (++subindex == POINTSPERCOMPACT) {
- target->hashpos.board[index++] = bits;
- bits = 0;
- subindex = 0;
- }
+ if (ko_pos != 0) {
+ target->hashval[0] ^= ko_hash[ko_pos][0];
+ target->hashval[1] ^= ko_hash[ko_pos][1];
}
-
- if (subindex != 0)
- target->hashpos.board[index] = bits;
-#endif /* USE_SHIFTING */
-
- if (ko_pos != 0)
- target->hashval ^= ko_hash[ko_pos];
- target->hashpos.ko_pos = ko_pos;
}
/*
- * Set ko in the hash value and hash position.
+ * Set or remove ko in the hash values.
*/
void
-hashdata_set_ko(Hash_data *hd, int pos)
+hashdata_invert_ko(Hash_data *hd, int pos)
{
- hd->hashval ^= ko_hash[pos];
- hd->hashpos.ko_pos = pos;
+ hd->hashval[0] ^= ko_hash[pos][0];
+ hd->hashval[1] ^= ko_hash[pos][1];
}
-/*
- * Remove any ko from the hash value and hash position.
- */
-
-void
-hashdata_remove_ko(Hash_data *hd)
-{
- if (hd->hashpos.ko_pos != 0) {
- hd->hashval ^= ko_hash[hd->hashpos.ko_pos];
- hd->hashpos.ko_pos = 0;
- }
-}
-
/*
* Set or remove a stone of COLOR at pos in a Hash_data.
@@ -312,97 +174,26 @@
void
hashdata_invert_stone(Hash_data *hd, int pos, int color)
{
- int i = I(pos);
- int j = J(pos);
- int index = (i * board_size + j) / POINTSPERCOMPACT;
- int subindex = (i * board_size + j) % POINTSPERCOMPACT;
-
if (color == BLACK) {
- hd->hashval ^= black_hash[pos];
- hd->hashpos.board[index] ^= black_patterns[subindex];
+ hd->hashval[0] ^= black_hash[pos][0];
+ hd->hashval[1] ^= black_hash[pos][1];
}
else if (color == WHITE) {
- hd->hashval ^= white_hash[pos];
- hd->hashpos.board[index] ^= white_patterns[subindex];
+ hd->hashval[0] ^= white_hash[pos][0];
+ hd->hashval[1] ^= white_hash[pos][1];
}
}
-/*
- * Compare two Hash_data, if different: dump an ASCII representation
- * of the differences to stderr.
- * return is the same as for hashposition_compare()
- */
-
-int
-hashdata_diff_dump(Hash_data *hd1, Hash_data *hd2)
-{
- int retval;
- int pos, i;
- int count1[4], count2[4];
- static const char letter[] = "abcdefghjklmnopqrstuvwxyz";
- static const char *hashcolors[] = {"Empty", "White", "Black", "Grey!"};
-
- retval = hashdata_compare(hd1, hd2);
- if (retval == 0)
- return retval;
-
- for (i = 0; i < 4; i++) {
- count1[i] = 0;
- count2[i] = 0;
- }
-
- fprintf(stderr, "Differences: ");
- for (i = 0; i < COMPACT_BOARD_SIZE; i++) {
- if (hd1->hashpos.board[i] != hd2->hashpos.board[i])
- fprintf(stderr, "\nSlot %d: (%lx <==> %lx)" , i,
- (unsigned long) hd1->hashpos.board[i],
- (unsigned long) hd2->hashpos.board[i]);
-
- for (pos = 0; pos < POINTSPERCOMPACT; pos++) {
- unsigned int u1, u2;
- int xx, yy, zz;
-
- u1 = (hd1->hashpos.board[i] >> (2*pos)) & 3;
- u2 = (hd2->hashpos.board[i] >> (2*pos)) & 3;
- count1[u1]++;
- count2[u2]++;
- if (u1 == u2)
- continue;
-
- zz = (i * POINTSPERCOMPACT) + pos;
- xx = zz / MAX_BOARD;
- yy = zz % MAX_BOARD;
- fprintf(stderr, "\n#%2d: [%c%d] %s<==>%s", pos, letter[xx], yy,
- hashcolors[u1], hashcolors[u2]);
- }
- }
-
- if (hd1->hashpos.ko_pos == 0 && hd2->hashpos.ko_pos == 0)
- fprintf(stderr, "\nNo ko\n");
- else if (hd1->hashpos.ko_pos == hd2->hashpos.ko_pos)
- gfprintf(stderr, "\nEqual Ko position:[%1m]\n", hd1->hashpos.ko_pos);
- else
- gfprintf(stderr, "\nDifferent Ko position:[%1m] <==> [%1m]\n",
- hd1->hashpos.ko_pos, hd2->hashpos.ko_pos);
-
- fprintf(stderr, "Total [%d,%d,%d,%d]",
- count1[0], count1[1], count1[2], count1[3]);
- fprintf(stderr, " <==> [%d,%d,%d,%d]\n",
- count2[0], count2[1], count2[2], count2[3]);
-
- return retval;
-}
-
int
hashdata_compare(Hash_data *hd1, Hash_data *hd2)
{
int rc;
- rc = (hd1->hashval == hd2->hashval) ? 0 : 2;
+ rc = (hd1->hashval[0] == hd2->hashval[0]) ? 0 : 2;
if (rc == 0)
- rc = hashposition_compare(&hd1->hashpos, &hd2->hashpos);
+ rc = (hd1->hashval[1] == hd2->hashval[1]) ? 0 : 1;
return rc;
}
Index: engine/hash.h
===================================================================
RCS file: /cvsroot/gnugo/gnugo/engine/hash.h,v
retrieving revision 1.6
diff -u -r1.6 hash.h
--- engine/hash.h 4 Mar 2002 06:49:08 -0000 1.6
+++ engine/hash.h 6 Mar 2002 05:00:51 -0000
@@ -85,31 +85,22 @@
#define COMPACT_BOARD_SIZE ((MAX_BOARD) * (MAX_BOARD) / POINTSPERCOMPACT + 1)
-typedef struct hashposition_t {
- Compacttype board[COMPACT_BOARD_SIZE];
- int ko_pos;
-} Hashposition;
-
-
/*
* This struct is maintained by the machinery that updates the board
* to provide incremental hashing. Examples: trymove(), play_move(), ...
*/
-typedef struct {
- Hashvalue hashval;
- Hashposition hashpos;
+
+typedef struct
+{
+ Hashvalue hashval[2];
} Hash_data;
void hash_init(void);
-int hashposition_compare(Hashposition *pos1, Hashposition *pos2);
-void hashposition_dump(Hashposition *pos, FILE *outfile);
-
void hashdata_recalc(Hash_data *hd, Intersection *board, int ko_pos);
int hashdata_compare(Hash_data *hd1, Hash_data *hd2);
-void hashdata_set_ko(Hash_data *hd, int pos);
-void hashdata_remove_ko(Hash_data *hd);
+void hashdata_invert_ko(Hash_data *hd, int pos);
void hashdata_invert_stone(Hash_data *hd, int pos, int color);
void hashdata_set_tomove(Hash_data *hd, int to_move);
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [gnugo-devel] arend_1_28.1: probabilistic hashing to save memory,
Arend Bayer <=