Title:

Statistical mechanics, generalisation and regularisation of neural network models

There has been much recent interest in obtaining analytic results for rule learning using a neural network. In this thesis the performance of a simple neural network model learning a rule from noisy examples is calculated using methods of statistical mechanics. The free energy for the model is defined and order parameters that capture the statistical behaviour of the system are evaluated analytically. A weight decay term is used to regularise the effect of the noise added to the examples. The network's performance is estimated in terms of its ability to generalise to examples from outside the data set. The performance is studied for a linear network learning both linear and nonlinear rules. The analysis shows that a linear network learning a nonlinear rule is equivalent to a linear network learning a linear rule, with effective noise added to the training data and an effective gain on the linear rule. Examining the dependence of the performance measures on the number of examples, the noise added to the data and the weight decay parameter, it is possible to optimise the generalisation error by setting the weight decay parameter to be proportional to the noise level on the data. Hence, a weight decay is not only useful for reducing the effect of noisy data, but can also be used to improve the performance of a linear network learning a nonlinear rule. A generalisation of the standard weight decay term in the form of a general quadratic penalty term or regulariser, which is equivalent to a general Gaussian prior on the network's weight vector, is considered. In this case an average over a distribution of rule weight vectors is included in the calculation to remove any dependence on the exact realisation of the rule.
