In ref.  the CG method outperformed BP on the parity problem. However, our experience with CG on HEP problems is the opposite; it is often unable to find the true global minimum. The same conclusion was reached in an extensive benchmark test of different ANN learning algorithms . Consequently, we see no reason to recommend using CG, although it learns toy problems very fast.
The strength of CG is in the rare cases when the path to the minimum follows a few long narrow valleys. However, it breaks down whenever the error surface is more or less flat, since the CG line search will attempt to find a minimum along a flat direction. As previously stated, flat surfaces often occur for networks with many hidden layers.
If one insists on using CG in such cases it is profitable to initialize the CG learning by a couple of BP sweeps in order to get out of the flat region. The use of a coarse line search is recommended. It is a waste of resources to search for a very exact minimum position along each conjugate direction. Also, the SCG algorithm is usually faster than standard CG since it avoids the line search.