Re: [Gneuralnetwork] Eventual Role for command-line gneuralnetwork.

That's really good perspective. From my direction I'm approaching this based on some work I've done in computer vision research where the architectures are complicated enough to really require digging into detailed code. I think some perspective on trying to make an interface as easy to access as possible to add little bits of machine learning where they might be helpful. At the same time, neural networks probably shouldn't be the first thing you take a look at if you want to add a machine learning algorithm to improve some predictive piece of code -- are we after a neural network package or a general learning package (e.g. what WEKA and other packages are trying to do)?

I'd like to take a look at the changes you've made to the configuration files - could you point me to a link?

I'm a bit confused by what you mean with network topologies having high Kolmogorov complexity. Could you give a brief example of a type of network you mean? I'm having trouble thinking of a network that can't be described a lot more simply than by listing its parameters and connections. For example, the VGG architecture that's significantly easier to express than by listing out its 139 million parameters. I could be misunderstanding your point here.

I'd love to chat more about the configuration file layout. I'll write some more when I get the chance.

Thanks!

On October 9, 2016 1:35:31 PM CDT, Ray Dillinger <address@hidden> wrote:



On 10/09/2016 05:41 AM, Neil Simmonds wrote:
 Here, here !
 
 I was interested in gneuralnetwork when thinking of connections to gnu
 radio - far too much for me as a developer but scripts and gnu radio ui
 sockets I can manage.  I'm an old time slightly unpractised time lisp guy.
 
 Neil
 

Oddly enough, I'm an old time lisp guy too.  I liked the idea I read
earlier in the mailing list about making gneuralnetwork scriptable via
guile or some other accessible FFI.

But we've got an important thing on the table.  In our configuration
file, we're creating a mini programming language.  It needs to be simple
enough that users can learn this and use it, without being so simple it
becomes impossible to specify large networks.  It needs to be
complex
enough to handle the things we need to handle, but not so complex that
it becomes too abstract and difficult to learn.

I've got what I think is a good way to represent network topology
(nodes, connections) and a good way to specify it in configuration
files. It's nice and general, and the simple cases are easy to specify
in a few lines of configuration. (the more complicated cases....  ergh.
Many of them have high kolmogorov complexity.  There's no way to specify
those compactly).

I'd like feedback about what else a configuration/save file has to
cover, and what syntax would be appropriate for covering it.  Right now
I'm dividing the config into sections.  I've got a syntax I'm happy with
for the topology section, but everything else is woolly ideas at the
present and discussion would really help hammer some of them out.

An error function, or fitness function, obviously, is part of the
network like the topology.  I'm inclined to make it part of the topology
section, because I think every node should at least potentially have a
feedback function. Boltzmann training and L1/L2 regularization penalties
apply to hidden nodes. Reinforcement training applies to input nodes.
Steepest-gradient-descent and a bunch of other things apply to output
nodes.  I think we don't need to allow multiple feedback functions per
node, but there's no particular reason why that wouldn't make sense,
so...  maybe?   What syntax should it have?

Obviously, we need to specify a writeback file.  IE, what filename to
save to when finished training.  Probably also a spec of how often to
save during the training, so you don't lose the benefit of training
that's been done if your computer crashes.

A training regime.  IE, what training needs to be done and how to do it.
 At present I'm thinking that the periodic saves to
the writeback file
should specify the training not yet done.  EG, if the initial schedule
is to do 40 epochs of SGI training, then the savefile made after epoch
10 would say to do 30 epochs of SGI training.  The idea being that you
can just load up the last savefile and it will finish the schedule.

Training regimes are a sticky topic. Even giving the error (and
regularization) functions in terms of individual nodes, there's still
the issue of how much training to do and what general metaparameters to
do it with, and most types of training regimes have parameters that
don't apply to any other case. What's the best way to handle that?

Data.  We need to keep the ability to specify training and testing cases
directly in the config file, but we also need the ability to specify
external sources.  EG, "here are pipes to read training cases, test
cases, validation cases, and online cases from.  Here's another pipe
to
send online output to."  And those pipes can be files, stdin/stdout,
etc.  I was thinking about having gneuralnetwork itself run external
programs to get its inputs, but then thought about people cracking the
system by having it run programs maliciously and decided to stick to
pipes in and out.  Leave invoking filter programs to scripts.  But no
matter what, we're specifying pipes or filenames.  What syntax should
that have?  Or should it be part of the command line options?  If it's
part of the command line options, it can still go into the script, as
options on the #! line at the start of the file.

Most of the time data will need to be transformed in some way to become
input that is in a form where gneuralnetwork can read it, so there will
have to be "front-ends."  IE, a suite of 'filter' type programs that
transform particular types of input (foo.jpg) or (foo.txt) into cases (a
set of numeric or binary
values that correspond to inputs for
gneuralnetwork's input nodes).  Do we need to provide a way to designate
a filter function, at all, in the config file, given that we oughtn't
invoke it directly?

Most of the time output will need to be transformed in some way to
become useful data.  IE, gneuralnetwork outputs a sequence of text or
binary values corresponding to its output nodes, and something else
transforms them into keystrokes and sends them to a game or into an
index mapping image files to classification categories that goes into a
database, or whatever.  We'll need to provide a set of such 'back-ends'
to pipe output through but, again, gneuralnetwork shouldn't invoke them
directly.  Should they be mentioned in the config file?

Hm.  A proper 'ToDo' item would be providing a few such filter programs
(which can be sed scripts or whatever) to distribute along with
gneuralnetwork. And that brings up
a separate consideration, which DOES
belong in the config file: What is the form of input and output?  We
need at least to provide standard formats for binary and text I/O and
specify in the config which format a particular network uses.

Feedback?  Please?  Or shall I just guess at how to do these things?



    Bear

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From:	Tom Acunzo
Subject:	Re: [Gneuralnetwork] Eventual Role for command-line gneuralnetwork.
Date:	Sun, 09 Oct 2016 20:16:17 -0400