Use this URL to cite or link to this record in EThOS:
Title: The statistical mechanics of Bayesian model selection
Author: Marion, Glenn
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 1996
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
In this thesis we examine the question of model selection in systems which learn input-output mappings from a data set of examples. The models we consider are inspired by feed-forward architectures used within the artificial neural networks community. The approach taken here is to elucidate the properties of various model selection criteria by calculation of relevant quantities derived in a Bayesian framework. These calculations make the assumption that examples are generated from some underlying rule or teacher by randomly sampling the input space and are performed using techniques borrowed from statistical mechanics. Such an approach allows for the comparison of different approaches on the basis of the resultant ability of the system to generalize to novel examples. Broadly stated, the model selection problem is the following. Given only a limited set of examples, which model, or student, should one choose from a set of candidates in order to achieve the highest level of generalization? We consider four model selection criteria. A penalty based method utilising a quantity derived from Bayesian statistics termed the evidence, and two methods based on estimates of the generalization performance namely, the test error and the cross validation error. The fourth method, less widely used, is based on the noise sensitivity of he models. In a simple scenario we demonstrate that model selection based on the evidence is susceptible to misspecification of the student. Our analysis is conducted in the thermodynamic limit where the system size is taken to be arbitrarily large. In particular we examine the evidence procedure assignments of the hyperparameters which control the learning algorithm. We find that, where the student is not sufficiently powerful to fully model the teacher, despite being sub-optimal this procedure is remarkably robust towards such misspecifications. In a scenario in which the student is more than able to represent the teacher we find the evidence procedure is optimal.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available