Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597919
Title: Data selection and model combination in connectionist speech recognition
Author: Cook, G. D.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 1997
Availability of Full Text:
Full text unavailable from EThOS. Please contact the current institution’s library for further details.
Abstract:
The hybrid connectionist-hidden Markov model (HMM) approach to large vocabulary continuous speech recognition has been shown to be competitive with HMM based systems. However, the recent availability of extremely large amounts of acoustic training data has highlighted a problem with the connectionist acoustic modelling paradigm. The effective use of such large amounts of data is difficult due to the computational requirements of training large connectionist models. This dissertation details research aimed at increasing the performance of connectionist acoustic models through the effective use of available training data. The methods investigated are based on ensembles of models. An ensemble is a collection of models which are combined in a manner such that the performance of the ensemble is greater than that of any of the models which form the ensemble. Most ensemble methods use a simple linear combination of the model estimates to form the ensemble estimate. A data dependent ensemble technique has been developed in which the combination of the ensemble models is dependent on the current input. The use of ensembles for speaker adaptation has been investigated, and a method based on clustering of training data has been developed and implemented. This speaker adaptation scheme does not require additional adaptation data, and can reduce the error rate of a hybrid connectionist-HMM speaker independent recognition system by up to 14.5%. In addition, clustering data allows effective use of large amounts of training data. Boosting is a method which makes selective use of training data, and produces an ensemble with each model trained on data drawn from a different distribution. Results on the optical character recognition task suggest that boosting can provide considerable gains in classification performance. The application of boosting to acoustic modelling has been investigated, and a modified boosting procedure developed. The boosting algorithms have been applied to multilayer perceptron acoustic models, and performance of the models assessed on a number of ARPA benchmark tasks. The results show that boosting consistently provides a 14-19% reduction in word error rate. The standard boosting techniques are not suitable for use with recurrent network acoustic models, and three new boosting algorithms have been developed for use with connectionist models with internal memory. These new boosting algorithms have also been evaluated on a number of ARPA benchmark tests, and have been shown to lead to a reduction in word error rate of 10-18%.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.597919  DOI: Not available
Share: