Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597745
Title: Adaptation of statistical language models for automatic speech recognition
Author: Clarkson, P. R.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 1999
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Abstract:
Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic speech recognition, and it is this that forms the focus of this thesis. Most current speech recognition systems are dedicated to one specific task (for example, the recognition of broadcast news), and thus use a language model which has been trained on text which is appropriate to that task. If, however, one wants to perform recognition on more general language, then creating an appropriate language model is far from straightforward. A task-specific language model will often perform very badly on language from a different domain, whereas a model trained on text from many diverse styles of language might perform better in general, but will not be especially well suited to any particular domain. Thus the idea of an adaptive language model whose parameters automatically adjust to the current style of language is an appealing one. In this thesis, two adaptive language models are investigated. The first is a mixture-based model. The training text is partitioned according to the style of text, and a separate language model is constructed for each component. Each component is assigned a weighting according to its performance at modelling the observed text, and a final language model is constructed as the weighted sum of each of the mixture components. The second approach is based on a cache of recent words. Previous work has shown that words that have occurred recently have a higher probability of occurring in the immediate future than would be predicted by a standard triagram language model. This thesis investigates the hypothesis that more recent words should be considered more significant within the cache by implementing a cache in which a word's recurrence probability decays exponentially over time. The problem of how to predict the effect of a particular language model on speech recognition accuracy is also addressed in this thesis. The results presented here, as well as those of other recent research, suggest that perplexity, the most commonly used method of evaluating language models, is not as well correlated with word error rate as was once thought. This thesis investigates the connection between a language model's perplexity and its effect on speech recognition performance, and will describe the development of alternative measures of a language models' quality which are better correlated with word error rate. Finally, it is shown how the recognition performance which is achieved using mixture-based language models can be improved by optimising the mixture weights with respect to these new measures.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.597745  DOI: Not available
Share: