next up previous
Next: Backpropagation Up: Initialization Previous: Initialization

Initial weight values

It is of utmost importance to ensure that the units are ``active learners'' and not saturated to their extreme values. The derivative of the activation function (eq. {gif)) is zero for saturated units and thus inhibits learning. This can be avoided by proper weight initialization. If the input is normalized to unit size, one simply scales the weights in proportion to the number of units feeding to a unit (the ``fan-in''). A suitable normalization for this is


Another method, suggested in [59], is to set the width PARJN(4) to any value and then process the training data through the network once and adjust the thresholds to make the average argument of each unit equal to zero. Other suggestions are found in refs. [67,68]. None of these methods are automated in JETNET 3.0.

System PRIVILEGED Account
Fri Feb 24 11:28:59 MET 1995