[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Training neural nets: How does size matter?
From: |
Douglas Zare |
Subject: |
[Bug-gnubg] Training neural nets: How does size matter? |
Date: |
Wed, 28 Aug 2002 21:34:07 -0400 |
User-agent: |
Internet Messaging Program (IMP) 3.1 |
I'm training some neural nets other than gnu, and would love to exchange some
ideas on training, architecture, etc. with the gnubg developers, among others.
I have a few questions I hope some on this mailing list have the experience to
answer. Some were prompted when a test network with 250K parameters that I was
training surpassed (on some benchmarks, but perhaps not playing strength) a
network with 1000K parameters, to my surprise.
First, roughly what level of improvement do you expect with mature networks of
different numbers of hidden nodes? The quality of a neural net is hard to
quantify abstractly, so one could pin it down to, say, correct absolute
evaluations in non-contact positions for the racing net, or elo, or cubeless
ppg against a decent standard.
I don't think Snowie 3's nets were mature, but if they and Snowie 4's nets are,
then how much of an improvement should one expect to see if Snowie 4 has neural
nets with twice as many hidden nodes?
Second, how many fewer nodes can you use for the same quality, if you release
the net from predicting what is covered in the racing database?
Third, Tesauro mentions that a neural network seems to learn a linear
regression first. Are there other describable qualitative phases that one
encounters? For example, does a neural network with 50 nodes first imitate the
linear regression, then a typical mature 5 node network, then 10 node?
It might be wishful thinking, but if it is the case, it might be possible to
retain most of the information by training a smaller network to imitate the
larger network's evaluations. The smaller network might be faster to train, and
then one could pass the information back.
Are there thresholds for the number of nodes necessary with one hidden layer
before particular backgammon concepts begin to be understood? In chess, people
say that with enough lookahead, strategy becomes tactics, but how many nodes do
you need before the timing issues of a high anchor holding game are understood
by static evaluations? How many for a deep anchor holding game?
Douglas Zare
- [Bug-gnubg] Training neural nets: How does size matter?,
Douglas Zare <=