ANN versus other methods

Next: Choice of ANN Up: Choosing the Model Previous: Choosing the Model

ANN versus other methods

There are many different methods around for doing multivariate statistical analysis, function fitting or prediction tasks and ANN represents only a small subset of these. From a statistical modeling point of view, ANN models belong to the general class of non-parametric methods that do not make any assumption about the parametric form of the function they model. In this sense they are more powerful than parametric methods that try to fit reality into a specific parametric form. However, non-parametric methods like ANN contain more free parameters and hence require more training data than parametric ones in order to achieve good generalization performance [25].

Fortunately, for most HEP problems one has access to big data samples, making it possible to exploit the capabilities of non-parametric models like ANN. Tests of ANN versus standard methods on pattern recognition HEP problems are therefore in favour of ANN models [26,27,28,29,30,31]. Also, unbiased comparisons of ANN and non-ANN methods on prediction tasks are in favour of ANN [32].

Inevitably, the choice of method depends on many problem dependent factors. Is the problem complex enough to call for a non-parametric method like ANN? Is data easily available? Does the application require real-time execution? Hence, it is impossible to give a general rule on what strategy to follow (see e.g. ref. [33] for a discussion of the subject). However, ANN methods have a number of features that make them particularly attractive:

The output nodes () are analytic functions of the arguments , if the activation function g is analytic. Derivatives with respect to the inputs can therefore be computed, which simplifies error estimation.
As discussed above the output nodes approximate the Bayes a posteriori probabilities [24], which are useful to make final decisions that minimize the overall risk [34].
Sigmoid units are not ``orthogonal'' and two hidden units may well perform identical tasks, which to some extent avoids overfitting. Also, this property can be very practical and even desirable if the goal is to produce a distributed system that is robust to weight losses. By a ``smart'' addition of noise in the training process, the network can be forced to choose a solution where the information is maximally distributed among the weights [35].
An ANN network is not a linear function of all its weights. This implies a very beneficial scaling property -- for some functions and networks the learning curves are independent of the number of inputs [36].

Due to their generality, ANN methods also have some drawbacks, the most prominent one being long training times. Other statistical methods learn in general much quicker. For instance, models with ``orthogonal'' units (e.g. polynomial ones) may just need one inversion of a matrix in order to be trained.

It is sometimes argued that statistical non-parametric methods, like decision trees etc., are preferable to ANN models since the former are easier to interpret. We disagree with this view. With the aid of a self-organizing network it is quite easy to interpret an ANN model [4].

Next: Choice of ANN Up: Choosing the Model Previous: Choosing the Model

System PRIVILEGED Account
Fri Feb 24 11:28:59 MET 1995