On 10/09/2016 05:41 AM, Neil Simmonds wrote:
Here, here !
I was interested in gneuralnetwork when thinking of connections to gnu
radio - far too much for me as a developer but scripts and gnu radio ui
sockets I can manage. I'm an old time slightly unpractised time lisp guy.
Oddly enough, I'm an old time lisp guy too. I liked the idea I read
earlier in the mailing list about making gneuralnetwork scriptable via
guile or some other accessible FFI.
But we've got an important thing on the table. In our configuration
file, we're creating a mini programming language. It needs to be simple
enough that users can learn this and use it, without being so simple it
becomes impossible to specify large networks. It needs to be
enough to handle the things we need to handle, but not so complex that
it becomes too abstract and difficult to learn.
I've got what I think is a good way to represent network topology
(nodes, connections) and a good way to specify it in configuration
files. It's nice and general, and the simple cases are easy to specify
in a few lines of configuration. (the more complicated cases.... ergh.
Many of them have high kolmogorov complexity. There's no way to specify
I'd like feedback about what else a configuration/save file has to
cover, and what syntax would be appropriate for covering it. Right now
I'm dividing the config into sections. I've got a syntax I'm happy with
for the topology section, but everything else is woolly ideas at the
present and discussion would really help hammer some of them out.
An error function, or fitness function, obviously, is part of the
network like the topology. I'm inclined to make it part of the topology
section, because I think every node should at least potentially have a
feedback function. Boltzmann training and L1/L2 regularization penalties
apply to hidden nodes. Reinforcement training applies to input nodes.
Steepest-gradient-descent and a bunch of other things apply to output
nodes. I think we don't need to allow multiple feedback functions per
node, but there's no particular reason why that wouldn't make sense,
so... maybe? What syntax should it have?
Obviously, we need to specify a writeback file. IE, what filename to
save to when finished training. Probably also a spec of how often to
save during the training, so you don't lose the benefit of training
that's been done if your computer crashes.
A training regime. IE, what training needs to be done and how to do it.
At present I'm thinking that the periodic saves to
the writeback file
should specify the training not yet done. EG, if the initial schedule
is to do 40 epochs of SGI training, then the savefile made after epoch
10 would say to do 30 epochs of SGI training. The idea being that you
can just load up the last savefile and it will finish the schedule.
Training regimes are a sticky topic. Even giving the error (and
regularization) functions in terms of individual nodes, there's still
the issue of how much training to do and what general metaparameters to
do it with, and most types of training regimes have parameters that
don't apply to any other case. What's the best way to handle that?
Data. We need to keep the ability to specify training and testing cases
directly in the config file, but we also need the ability to specify
external sources. EG, "here are pipes to read training cases, test
cases, validation cases, and online cases from. Here's another pipe
send online output to." And those pipes can be files, stdin/stdout,
etc. I was thinking about having gneuralnetwork itself run external
programs to get its inputs, but then thought about people cracking the
system by having it run programs maliciously and decided to stick to
pipes in and out. Leave invoking filter programs to scripts. But no
matter what, we're specifying pipes or filenames. What syntax should
that have? Or should it be part of the command line options? If it's
part of the command line options, it can still go into the script, as
options on the #! line at the start of the file.
Most of the time data will need to be transformed in some way to become
input that is in a form where gneuralnetwork can read it, so there will
have to be "front-ends." IE, a suite of 'filter' type programs that
transform particular types of input (foo.jpg) or (foo.txt) into cases (a
set of numeric or binary
values that correspond to inputs for
gneuralnetwork's input nodes). Do we need to provide a way to designate
a filter function, at all, in the config file, given that we oughtn't
invoke it directly?
Most of the time output will need to be transformed in some way to
become useful data. IE, gneuralnetwork outputs a sequence of text or
binary values corresponding to its output nodes, and something else
transforms them into keystrokes and sends them to a game or into an
index mapping image files to classification categories that goes into a
database, or whatever. We'll need to provide a set of such 'back-ends'
to pipe output through but, again, gneuralnetwork shouldn't invoke them
directly. Should they be mentioned in the config file?
Hm. A proper 'ToDo' item would be providing a few such filter programs
(which can be sed scripts or whatever) to distribute along with
gneuralnetwork. And that brings up
a separate consideration, which DOES
belong in the config file: What is the form of input and output? We
need at least to provide standard formats for binary and text I/O and
specify in the config which format a particular network uses.
Feedback? Please? Or shall I just guess at how to do these things?