Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.761596
Title: Computer lipreading via hybrid deep neural network hidden Markov models
Author: Thangthai, Kwanchiva
ISNI:       0000 0004 7652 7837
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Constructing a viable lipreading system is a challenge because it is claimed that only 30% of information of speech production is visible on the lips. Nevertheless, in small vocabulary tasks, there have been several reports of high accuracies. However, investigation of larger vocabulary tasks is rare. This work examines constructing a large vocabulary lipreading system using an approach based-on Deep Neural Network Hidden Markov Models (DNN-HMMs). We present the historical development of computer lipreading technology and the state-ofthe-art results in small and large vocabulary tasks. In preliminary experiments, we evaluate the performance of lipreading and audiovisual speech recognition in small vocabulary data sets. We then concentrate on the improvement of lipreading systems in a more substantial vocabulary size with a multi-speaker data set. We tackle the problem of lipreading an unseen speaker. We investigate the effect of employing several stepstopre-processvisualfeatures. Moreover, weexaminethecontributionoflanguage modelling in a lipreading system where we use longer n-grams to recognise visual speech. Our lipreading system is constructed on the 6000-word vocabulary TCDTIMIT audiovisual speech corpus. The results show that visual-only speech recognition can definitely reach about 60% word accuracy on large vocabularies. We actually achieved a mean of 59.42% measured via three-fold cross-validation on the speaker independent setting of the TCD-TIMIT corpus using Deep autoencoder features and DNN-HMM models. This is the best word accuracy of a lipreading system in a large vocabulary task reported on the TCD-TIMIT corpus. In the final part of the thesis, we examine how the DNN-HMM model improves lipreading performance. We also give an insight into lipreading by providing a feature visualisation. Finally, we present an analysis of lipreading results and suggestions for future development.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.761596  DOI: Not available
Share: