neurostat-develop
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Neurostat-develop] API proposal


From: Fabrice Rossi
Subject: Re: [Neurostat-develop] API proposal
Date: Fri, 29 Mar 2002 19:27:27 +0100

Joseph wrote:
> 
> Ok, maybe we can adopt the second method (joseph's way), not because it
> is my way but because I thinks that it is a more classical way, more
> documented in  books dealing with neural networks.

Right. But we still have a problem with derivatives with respect to the inputs
which can be computed with my method more efficiently than with the
traditionnal one. To be honest, what we really need for penalization methods
for instance is second order derivatives with respect to the inputs and I
guess should first focus on first order derivatives. So I agree to use your
method!

> The API for back-propagate become then
> 
>  int weight_derivatives(MLP *arch, Activation *act, double * jacobian,
> double *w_jacobian)
> 
>  int inputs_derivatives(MLP *arch, Activation *act, double * jacobian,
> double *in_Jacobian)

Yeap.
 
> Next, I propose that MLP will be an aggregate of more simple structure :
> the layer.
> 
> This structure is basically a MLP restricted to one layer, so the
> previous API for the MLP have mainly to be replicated for the layers,
> this yields us the following API :
> 
> int layer_propagate(Layer *lay, double *weights, double * input, double
> * output, activation * act, activation *dact)
> 
> int layer_back_propagate(Layer *lay, double *weights, double *
> local_deriv_out,double *local_derive_in, activation *dact)
> 
> where :
> 
> -lay is a pointer on the layer structure
> -weight is a pointer on the parameter of the layer
> -local_deriv_out in the derivative with respect to the preoutput of the
> layer
> -local_deriv_in is the derivative with respect to the preinput of the
> layer
> -dact is a pointer of the derivative of the input of the layer
> 
> Remark that "local_deriv_out" and "local_deriv_in" are pointer on the
> addresses belonging to the Jacobian of the error and "weight" are
> pointer on address belonging the parameter vector of the MLP.
> 
> the calculation of the derivative with respect to the network weight is
> done by :
> 
> int weight_layer_derivative(Layer *lay, Activation *act, double
> *local_deriv_out, double *w_deriv)
> 
> "w_deriv" is an adresse belonging to the w_jacobian vector.
> 
> The derivatives with respect to the layer entries is useless, for the
> input_derivative function.
> 
> I know, I think early to the detail, but dimitri needs this API to begin
> the code of the layers and then of the MLP.

I don't think it's premature optimization, so it can't be evil! Anyway, I
still have remarks about this layer thing. 

Basically, I don't really understand what can be simplified thanks to these
layer functions. At first I tought we could hide details to layer but in fact,
it might be difficult because of one of our main assumptions: we allow a layer
to receive inputs from any preceding layer. Therefore, to calculate the output
of one layer, we must in general use parts of *act. 

My point of view is that we are going to turn propagate (for instance) into
this kind of code:

loop on layers
  layer_propagate

and all the complexity will be move to layer_propagate. I don't think there is
so much appeal for this move, but maybe I'm wrong.

Fabrice



reply via email to

[Prev in Thread] Current Thread [Next in Thread]