next up previous
Next: Conjugate Gradients Up: Second-Order Algorithms Previous: Second-Order Algorithms

Heuristic Methods

One well-known method to approximate the curvature information is the Quickprop ( QP) algorithm [9], where the basic idea is to estimate the weight changes by assuming a parabolic shape for the error surface. The weight changes are then modified by the use of heuristic rules to ensure downhill motion at all times. Furthermore, a small constant is added to the derivative of the activation function to escape flat spots on the error surface. In short, the updating for each weight reads


where is the Heaviside step function and is the derivative of E with respect to the actual weight. This updating corresponds to a ``switched'' gradient descent with a parabolic estimate for the momentum term. To prevent the weights from growing too large, which indicates that QP is going wrong, a maximum scale is set on the weight update and it is recommended to use a weight decay term (see below). The algorithm is also restarted if the weights grow too large [20].

Another heuristic method, suggested by several authors [10,21,22], is the use of individual learning rates for each weight that are adjusted according to how ``well'' the actual weight is doing. Ideally, these individual learning rates adjust to the curvature of the error surface and reflect the inverse of the Hessian. In our view, the most promising of these schemes is Rprop [10]. Rprop combines the use of individual learning rates with the Manhattan updating rule, eq. (gif), adjusting the learning step for each weight according to


where .

System PRIVILEGED Account
Fri Feb 24 11:28:59 MET 1995