Preprocessing the data

Next: When to Stop Up: Guidelines and Rules Previous: Conjugate Gradients

Preprocessing the data

Preprocessing the data is important for many reasons.

To prevent overfitting by reducing the number of inputs and hence the number of weights.
To avoid ``stiffness'' in the learning process by rescaling the data.
To simplify the problem by precomputing useful signatures from the data [60].

The input space dimension can be reduced by performing a Principal Component Analysis ( PCA) and select the n first principal axes as the basis in feature space. The PCA does not however guarantee that the chosen inputs are relevant for the output, it only selects the inputs with the largest variance. Also, one should keep in mind that PCA assumes linear dependencies and one might hence loose nonlinear information by employing it.

For function mapping problems the most powerful method, to our knowledge, for extracting functional dependencies between input and output is the so-called -test [61]. This test only assumes that the function is continuous and uses conditional probabilities to select the significant input variables.

Normalization of the input is done to prevent ``stiffness'', i.e. when weights need to be updated with very different learning rates. Two simple normalization options are; either scale the inputs to the range , or translate them to their mean values and rescale to unit variance. The former method is useful if the data is more or less evenly distributed over a limited range, whereas the latter is useful when the data contains outliers. In some cases, such normalizations reduce the learning time for the network by an order of magnitude.

A method suggested in [26] is to let the network handle the normalization by adding an extra layer of units. This is useful if the data is not available beforehand to compute the relevant scales.

System PRIVILEGED Account
Fri Feb 24 11:28:59 MET 1995