[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gneuralnetwork] Another paper - this one almost amusing.

From: Ray Dillinger
Subject: [Gneuralnetwork] Another paper - this one almost amusing.
Date: Mon, 2 Jan 2017 19:46:20 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

MIT, Berkeley, and Google Brain have rediscovered what MIT and Berkeley
at least already knew 20 years ago.  Back propagation overfits and
memorizes any set of training data that's too small.  Even with NO
patterns to recognize.  Even with patterns deliberately unconnected to

It's a hell of a demo though; swap the labels on a bunch of images
around randomly, the network can still learn them.  Assign labels to
images composed entirely of random noise, and the network can still
learn them.

To be fair, this paper does quantize "too small" exactly, and it's a lot
bigger than I thought.  More training data is required to even start
breaking away from memorization than I knew.

This is one of the ways in which backpropagation is fundamentally flawed
for non-toy problems.  Give it enough training on too few examples, and
it will just memorize them instead of finding real patterns.

This is why Hinton's breakthrough of Deep Belief training (feedforward
learning from the data up instead of feedback from the outputs down) was
such an important discovery.  It gave us an alternative to
backpropagation that made much larger and much deeper networks feasible
to train without overfitting.

However, the wisdom we've known for over 20 years remains true; If you
have a thousand weights and biases in your network and you're training
with backprop, you'd better have more than 10,000 training cases.


Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]