The optimal model is not necessarily one single model. Instead, it may be profitable to divide the problem into smaller subtasks, like separating ``location'' from ``form'', and use different models for the subtasks. Such modular systems are often more efficient and easier to train than systems based upon a single architecture only. They are also easier to train. One example is presented in  where an MLP with a superficial LVQ network is shown to be more efficient than just the single MLP for classifying hadronic events. The superficial LVQ layer is able to resolve non-linearities that remain even after the final hidden layer in the MLP. Another example is the n-class classification tasks, where it may be wise to train n networks to recognize one class and then combine them into a larger network. This avoids the problem of interference, which occurs when the recognition of one class interferes with the recognition of another class due to the non-locality of the MLP division process.