Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.638359
Title: Neural models for speaker recognition
Author: Oglesby, J.
Awarding Body: University College of Swansea
Current Institution: Swansea University
Date of Award: 1991
Availability of Full Text:
Access from EThOS:
Abstract:
In recent years a resurgence of interest in neural modeling has taken place. This thesis examines one such class applied to the task of speaker recognition, with direct comparisons made to a contemporary approach based on vector quantisation (VQ). Speaker recognition systems in general, including feature representations and distance measures, are reviewed. The VQ approach, used for comparisons throughout the experimental work, is described in detail. Currently popular neural architectures are also reviewed and associated gradient-based training procedures examined. The performance of a VQ speaker identification system is determined experimentally for a range of popular speech features, using codebooks of varying sizes. Perceptually-based cepstral features are found to out-perform both standard LPC and filterbank representations. New approaches to speaker recognition based on multilayer perceptrons (MLP) and a variant using radial basis functions (RBF) are proposed and examined. To facilitate the research in terms of computational requirements a novel parallel training algorithm is proposed, which dynamically schedules the computational load amongst the available processors. This is shown to give close to linear speed-up on typical training tasks for up to fifty transputers. A transputer-based processing module with appropriate speech capture and synthesis facilities is also developed. For the identification task the MLP approach is found to give approximately the same performance as equivalent sized VQ codebooks. The MLP approach is slightly better for smaller models, however for larger models the VQ approach gives marginally superior results. MLP and RBF models are investigated for speaker verification. Both techniques significantly out-perform the VQ approach, giving 29.5% (MLP) and 21.5% (RBF) true talker rejections for a fixed 2% imposter acceptance rate, compared to 34.5% for the VQ approach. These figures relate to single digit test utterances. Extending the duration of the test utterance is found to significantly improve performance across all techniques. The best overall performance is obtained from RBF models: five digit utterances achieve around 2.5% true talker rejections for a fixed 2% imposter acceptance rate.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.638359  DOI: Not available
Share: