LEARNING BY ON-LINE GRADIENT DESCENT
Michael Biehl, and Holm Schwarze
Abstract: We study on-line gradient descent learning in multilayer
networks analytically and numerically. The training is based on randomly
drawn inputs and their corresponding outputs as defined by a target rule.
In the thermodynamic limit we derive deterministic differential
equations for the order parameters of the problem which allow an
exact calculation of the evolution of the generalization error. First we
consider a single-layer perceptron with sigmoidal activation function
learning a target rule defined by a network of the same architecture. For
this model the generalization error decays exponentially with the number
of training examples if the learning rate is sufficiently small. However, if
the learning rate is increased above a critical value, perfect learning is no
longer possible. For architectures with hidden layers and fixed
hidden-to-input weights, such a s the parity and the committee machine,
we find additional effects related to the existence of symmetries in these
problems.
LU TP 94-10