Preprocessing the data is important for many reasons.
For function mapping problems the most powerful method, to our knowledge, for extracting functional dependencies between input and output is the so-called -test [61]. This test only assumes that the function is continuous and uses conditional probabilities to select the significant input variables.
Normalization of the input is done to prevent ``stiffness'', i.e. when weights need to be updated with very different learning rates. Two simple normalization options are; either scale the inputs to the range , or translate them to their mean values and rescale to unit variance. The former method is useful if the data is more or less evenly distributed over a limited range, whereas the latter is useful when the data contains outliers. In some cases, such normalizations reduce the learning time for the network by an order of magnitude.
A method suggested in [26] is to let the network handle the normalization by adding an extra layer of units. This is useful if the data is not available beforehand to compute the relevant scales.