next up previous
Next: Second-Order Algorithms Up: Learning in Feed-Forward Previous: Learning in Feed-Forward

The Back-Propagation Family

Minimizing eq. (gif) with gradient descent is the least sophisticated but nevertheless in many cases a sufficient method. It amounts to updating the weights according to the back-propagation ( BP) learning rule [5]

 

where

 

Here refers to the whole vector of weights and thresholds used in the networkgif.

A momentum term is often also added to stabilize the learning

 

where .

Initial ``flat-spot'' problems and local minima can to a large extent be avoided by introducing noise to the gradient descent updating rule of eq. (gif). This is conveniently done by adding a properly normalized Gaussian noise term [6]

 

which we refer to as Langevin updating, or by using the more crude non-strict gradient descent procedure provided by the Manhattan [17] updating rulegif

 



System PRIVILEGED Account
Fri Feb 24 11:28:59 MET 1995