bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: The importance of METs


From: Douglas Zare
Subject: Re: [Bug-gnubg] Re: The importance of METs
Date: Tue, 9 Sep 2003 16:00:59 -0400
User-agent: Internet Messaging Program (IMP) 3.2.2

Quoting Joseph Heled <address@hidden>:

> Douglas Zare wrote:
> > 
> > Ok. I'm not sure that I see enough accuracy to say 0.12% rather than
> 0.0-0.2%,
> > but I'll trust that someone has gone through that carefully. However, the
> > Woolsey-Heinrich MET is a straw man. Woolsey says he doesn't use it (for
> > extreme scores), and there are scores which seem to be quite wrong, such as
> for
> > 3-away 4-away. If you have a new MET that is supposed to be an improvement
> over
> > what is out there, why not test it against METs people believe, or at
> least
> > better ones?
> 
> I think perhaps you are missing the history of this discussion.

I missed some of it.

> It turns out that even toppling this straw man is not easy. Past 
> experience, (and other runs I did, such as Snowie/mec26 and 
> Trice/mec26), convinced me once more that difference are small, and any 
> reasonable table will do.

Yes, I recall e-mail correspondence with you on that subject before I started
reading this mailing list. So, I was surprised to see a new MET hyped as
"significantly better" and "1.2% better." Both are highly misleading
statements, even if you put the decimal point back. 

> My guess is that NN errors (or noise if you will) are much bigger than 
> differences between METS, so a better MET will become important only if 
> the NN will become much better, which is unlikely before bots start 
> playing using rollouts in realtime (perhaps 5 years from now, assuming 
> moore's law holds?).

I don't see the logical connection. If the NN were currently rated 1500, and a
better MET could raise that to 1550, it would be important even though there
are other improvements that would make larger differences. However, KvdD's
experiments (which in this case look better) suggest that using Woolsey's MET
loses about 1 rating point. The confidence interval is relatively wide, though.
I would not be surprised if the correct value were 4 elo points.  

One elo point. That is an interesting figure. How far will it be spread?

A gain of a single elo point is not worth advertising, and this is why I
suggested that as a reality check, the improvements be expressed in terms of
elo. However, if you focus on some match scores, you will find larger, more
meaningful improvements, even though those match scores are not hit in every
match.

Douglas Zare





reply via email to

[Prev in Thread] Current Thread [Next in Thread]