gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnugo-devel] Floating point arithmetics


From: Portela Fernando
Subject: [gnugo-devel] Floating point arithmetics
Date: Thu, 16 Sep 2004 12:44:49 +0200

Hi,

I mentioned a couple weeks ago a platform-dependency problem. Since
I finally got around getting a Linux box running, I tried to investigate
by myself. The problem statement in short :

Current CVS, regressing century2002:150.

Linux:
century2002                              3.63    561561     681     4589

Win32 (VC++ build):
century2002:150 FAIL A18 [B18]
century2002                              3.64    565550     681     4617

A rapid analysis showed differences in move valuations, due to territory
erasure by the break-in code.

Then I noticed strong differences in nodes counters. For the first
batch :

Linux:
reading                                  3.91     86443       0        0
owl                                     85.11  14817938   18479    89309
owl_rot                                  1.36    246077      96     1983
ld_owl                                  33.67   4605715   23230     5810
optics                                   1.56    207448       0      937
filllib                                  9.82   1001313    1213     7738
atari_atari                             15.51   2820607    1968    13834
connection                              16.41   3301900       0    38031
break_in                                 1.87    409148     451     3740
blunder                                 20.71   3356064    3368    20102
unconditional                            0.80     38816       0        0
trevora                                 75.64  16084241   59073   128449
nngs1                                  235.76  50957389   68605   391863
strategy                               197.62  38937784   75831   311809

Win32 (VC++ build):
reading                                  3.92     86443       0        0
owl                                     77.39  14982712   18479    92516
owl_rot                                  1.36    244683      96     1942
ld_owl                                  31.75   4606698   23230     5804
optics                                   1.63    207400       0      934
filllib                                  9.23   1001164    1213     7739
atari_atari                             15.23   2819317    1968    13812
connection                              16.23   3301860       0    37888
break_in                                 1.84    399540     451     3584
blunder                                 19.94   3321666    3368    20015
unconditional                            1.16     38816       0        0
trevora                                 73.89  16124199   59073   129095
nngs1                                  228.83  51516879   68605   398756
strategy                               189.34  39136720   75831   314446

I rapidly concluded that there must be an underlying problem with the
connection code. And I strongly suspected floating point arithmetics.

After some debugging, I could spot a location where things could (and
actually do) go wrong, in the ENQUEUE() macro. The first comparison
involves values which haven't been normalized, with the consequence that
the delta, vulnerable1 and vulnerable2 fields might (or might not) get
overwritten, leading to possible variations in the further processing
of the queue.

As a possible solution, I rejected the idea of spreading lots of
gg_normalize_float() calls throughout the code. It seemed much simple
and efficient to transform the floating point arithmetic into a fixed
point one (well, sort of). So I wrote a simple patch, just replacing
float declarations by int ones, and scaling all the constants by 10000
(smallest constant found in the ENQUEUE_STONE() macro).

Testing the patch resulted in :

* Positive

- Nodes counts are almost identical on both Linux and Win32 (there are
  still a couple deltas in trevora, nngs and nngs3, which means there
  are problems elsewhere)
- Regression breakage is identical (the century2002:150 problem on Win32
  has disappeared)
- Regression breakage is apparently positive compared to CVS, with 1
  FAIL and 3 PASSes (not analyzed yet, but at first glance, the PASSes
  all look good)

  break_in:100    FAIL 1 D9 [0]
  nngs:1280       PASS D13 [D13]
  connect:70      PASS 0 [0]
  global:1        PASS B3 [B3]


* Negative

- Performance impact is heavy : +2% or so in reading nodes, 
                                +5.7% connection nodes,
                                timing around +5% (imprecise)

  My guess is that with CVS and the above mentioned problem in
  ENQUEUE(), there is quite a number of cases where vulnerabilities
  are overwritten, globally resulting in less checks and readings.

- A possible issue for us developers : tuning the constants will be
  less natural than with floating points.



Questions :

1. Are we interested by this patch, even at the mentioned performance
   cost ?

2. If I submit a patch, should I make the change reversible ? In other
   words, provide typedefs and #define's so as to be able to switch ?
   To be honest, I don't see any good reason we'd possibly want to go
   back to floating points, but maybe someone on the list has better
   ideas on the topic.


-- nando





reply via email to

[Prev in Thread] Current Thread [Next in Thread]