Use this URL to cite or link to this record in EThOS:
Title: Acoustic level speech recognition
Author: Lucas, Adrian Edward
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 1991
Availability of Full Text:
Access through EThOS:
Access through Institution:
A number of techniques have been developed over the last forty years which attempt to solve the problem of recognizing human speech by machine. Although the general problem of unconstrained, speaker independent connected speech recognition is still not solved, some of the methods have demonstrated varying degrees of success on a number of constrained speech recognition tasks. Human speech communication is considered to take place on a number of levels from the acoustic signal through to higher linguistic and semantic levels. At the acoustic level, the recognition process can be divided into time-alignment (the removal of global and local timing differences between the unknown input speech and the stored reference templates) and referencete mplate matching. Little attention seems to have been given to the effective use of acoustic level contextual information to improve the performance of these tasks. In this thesis, a new template matching scheme is developed which addresses this issue and successfully allows the utilization of acoustic level context. The method, based on Bayesian decision theory, is a dynamic time warping approach which incorporates statistical dependencies in matching errors between frames along the entire length of the reference template. In addition, the method includes a speaker compensation technique operating simultaneously. Implementation is carried out using the highly efficient branch and bound algorithm. Speech model storage requirements are quite small as a result of an elegant feature of the recursive matching criterion. Furthermore, a novel method for inferencing the special speech models is introduced. The new method is tested on data drawn from nearly 8000 utterances of the 26 letters of the British English Alphabet spoken by 104 speakers, split almost equally between male and female speakers. Experiments show that the new approach is a powerful acoustic level speech recognizer achieving up to 34% better recognition performance when compared with a conventional method based on the dynamic programming algorithm.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Information theory & coding theory Signal processing Information theory