Back-propagation is the most widely used learning algorithm since it is very simple to implement and, most importantly, it often outperforms other methods in spite of its simplicity. We emphasize, though, that this statement refers to the ``on-line'' variant of BP, where the weights are updated after presentation of only a small subset of training data. On-line BP is much faster than batch mode BP.
For networks with more than one hidden layer it is beneficial to use the Langevin updating variant (eq. ()), where noise is added to the BP equations . This is because the Hessian matrix easily becomes ill-conditioned with a flat subspace where the random search in Langevin updating is very efficient as compared to other alternatives.