Pruning

Next: Computing the Hessian Up: Practical Implementation Issues Previous: Output

Pruning

JETNET 3.0 implements the weight decay and the pruning method of eqs. () and (). The Lagrange multipliers correspond to parameters PARJN(5) and PARJN(14) respectively. Hessian-based pruning can be done by computing the Hessian matrix, its eigenvalues and eigenvectors: Small weights that lie inside a flat subspace can be omitted.

Following [63], we tune in eq. () according to

If or increment .
If and and decrease .
If and and rescale , where c < 1.

where

is the current training error and the other quantities are defined as

: The weighted average error: .
: The desired error, which acts as a threshold for the procedure. Solutions with error above D are not pruned unless the training error is decreasing. In ref. [63] it is advised for hard problems that D is set to random performance, which in practice means that pruning is always on.

Altough it is quite tricky to get it to work properly, we have used this procedure successfully on both toy and real-world problems. The recommended value for

in [63] is of order unity if the activation functions for the units are of order unity. This agrees with our experience, with the modification that

should follow

. However, our tests were performed on problems where the number of inputs ranged between two and ten, whereas the largest networks in [63] have up to forty inputs. Also, on toy problems we find that the parameter

can be increased considerable (orders of magnitude) above the suggested default value of

In JETNET 3.0 these pruning parameters correspond to; PARJN(14) for , PARJN(15) for , PARJN(16) for , PARJN(17) for c, PARJN(18) for , and PARJN(19) for the desired error D. Of these, PARJN(15), PARJN(18) and PARJN(19) are crucial.

Next: Computing the Hessian Up: Practical Implementation Issues Previous: Output

System PRIVILEGED Account
Fri Feb 24 11:28:59 MET 1995