Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.641288
Title: Finite size effects in neural network algorithms
Author: Barber, David
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 1996
Availability of Full Text:
Access through EThOS:
Full text unavailable from EThOS. Please try the link below.
Access through Institution:
Abstract:
One approach to the study of learning in neural networks within the physics community has been to use statistical mechanics to calculate the expected error that a network will make on a typical novel example, termed the generalisation error. Such average case analyses have been mainly carried out with recourse to the thermodynamic limit in which the size of the network is taken to infinity. For the case of a finite sized network, however, the error is not self-averaging i.e., it remains dependent upon the actual set of examples used to train and test the network. The error estimated on a specific test set realisation, termed the test error, forms a finite sample approximation to the generalisation error. We present in this thesis a systematic examination of test error variances in finite sized networks trained by stochastic learning algorithms. Beginning with simple single layer systems, in particular, the linear perceptron, we calculate the test error variance arising from randomness in both the training examples and the stochastic Gibbs learning algorithm. This quantity enables us to examine the performance of networks in a limited data scenario, including the optimal partitioning of a data set into a training and testing set in order to minimize the average error that the network makes, whilst remaining confident that the average test error is representative. A detailed study of the variance of cross-validation errors is carried out, and a comparison made between different cross-validation schemes. We examine also the test error variance of the binary perceptron, comparing the results to the linear case. Employing the results for the variance of errors, we calculate how likely are worst case errors as derived from the PAC theory, finding that the probability of such worst case occurrences is extremely small. In addition, we study the effect of a finite system size on the on-line training of multi-layer networks, in which we track the dynamic evolution of the error variance under the stochastic gradient descent algorithm used to train the network on an increasing amount of data. We find that the hidden unit symmetries of the multi-layer network give rise to relatively large finite size effects around the point at which the symmetries are broken.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.641288  DOI: Not available
Share: