Skip to main content

A Statistical Approach to Learning and Generalization in Layered Neural Networks

01 October 1990

New Image

The problem of learning a general input-output relation using a `layered neural network` is discussed in a statistical framework. By imposing the consistency condition that the error minimization be equivalent to a likelihood maximization for training the network, we arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture. This statistical description enables us to evaluate the probability of a correct prediction of an independent example, after training the network on a given training set. The prediction probability is highly correlated with the generalization ability of the network, as measured outside the training set. This suggests a general and practical criterion for training layered networks by minimizing prediction errors. We demonstrate the utility of this criterion for selecting the optimal architecture in the contiguity problem. As a theoretical application of the statistical formalism, we discuss the question of learning curves and estimate the sufficient training size needed for correct generalization, in a simple example.