Use this URL to cite or link to this record in EThOS:
Title: Role of biases in neural network models
Author: West, Ansgar Heinrich Ludolf
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 1997
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
The capacity problem for multi-layer networks has proven especially elusive. Our calculation of the capacity of multi-layer networks built by constructive algorithms relies heavily on the existence of biases in the basic building block, the binary perceptron. It is the first time where the capacity is explicitly evaluated for large networks and finite stability. One finds that the constructive algorithms studied, a tiling-like algorithm and variants of the upstart algorithm, do not saturate the known Mitchison-Durbin bound. In supervised learning, a student network is presented with training examples in the form of input-output pairs, where the output is generated by a teacher network. The central question to be answered is the relation between the number of examples presented and the typical performance of the student in approximating the teacher rule, which is usually termed generalisation. The influence of biases in such a student-teacher scenario has been assessed for the two-layer soft-committee architecture, which is a universal approximator and already resembles applicable multi-layer network models, within the on-line learning paradigm, where training examples are presented serially. One finds that adjustable biases dramatically alter the learning behaviour. The suboptimal symmetric phase, which can easily dominate training for fixed biases, vanishes almost entirely for non-degenerate teacher biases. Furthermore, the extended model exhibits a much richer dynamical behaviour, exemplified especially by a multitude of (attractive) suboptimal fixed points even for realizable cases, causing the training to fail or be severely slowed down. In addition, in order to study possible improvements over gradient decent training, an adaptive back-propagation algorithm parameterised by a "temperature" is introduced, which enhances the ability of the student to distinguish between teacher nodes. This algorithm, which has been studied in the various learning stages, provides more effective symmetry breaking between hidden units and faster convergence to optimal generalisation.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available