Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489
Title: Audio-visual speech and speaker recognition
Author: Seymour, R.
Awarding Body: Queens University Belfast
Current Institution: Queen's University Belfast
Date of Award: 2008
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Abstract:
In this thesis, a number of important issues relating to the use of both audio and video information for speech and speaker recognition are investigated. A comprehensive comparison of different visual feature types is given, including both geometric and image transformation based features. A new geometric based method for feature extraction is described, as well as the novel use of curvelet based features. Different methods for constructing the feature vectors are compared, as well as feature vector sizes and the use of dynamic features. Each feature type is tested against three types of visual noise: compression, blurring and jitter. A novel method of integrating the audio and video information streams called the maximum stream posterior (MSP) is described. This method is tested in both speaker dependent and speaker independent audio-visual speech recognition (AVSR) systems, and is shown to be robust to noise in either the audio or video streams, given no prior knowledge of the noise. This method is then extended to form the maximum weighted stream posterior (MWSP) method. Finally, both the MSP and MWSP are tested in an audio-visual speaker recognition system (AVSpR). / Experiments using the XM2VTS database will show that both of these methods can outperform ,_.','/ standard methods in terms of recognition accuracy in situations where either stream is corrupted.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.492489  DOI: Not available
Share: