Use this URL to cite or link to this record in EThOS:
Title: Robust multimodal person identification given limited training data
Author: McLaughlin, N. R.
ISNI:       0000 0004 2742 0531
Awarding Body: Queen's University Belfast
Current Institution: Queen's University Belfast
Date of Award: 2013
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal feature fusion (MOFF), for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowl- edge about the corruption. Furthermore, it is assumed there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature rep- resentation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Similarity-based optimal feature selection and multi- condition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Low-level feature fusion is performed using optimal feature selection, which automatically changes the weighting given to each modality based on the level of corruption. The framework for robust person identification is also applied to noise robust speaker identification, given very limited training data. Experiments have been carried out on a bimodal data set created from the SPIDRE speaker recogni- tion database and AR face recognition database, with variable noise corruption of speech and occlusion in the face images. Combining both modalities using MOFF, leads to significantly improved identification accuracy compared to the component unimodal systems, even with simultaneous corruption of both modal- ities. A novel piecewise-constant illumination model (PCIlVI) is then introduced for illumination invariant facial recognition. This method can be used given a single training facial image for each person, and assuming no prior knowledge of the illumination conditions of both the training and testing images. Small areas of the face are represented using magnitude Fourier features, which takes advan- tage of the shift-invariance of the magnitude Fourier representation, to increase robustness to small misalignment errors and small facial expression changes. Fi- nally, cosine similarity is used as an illumination invariant similarity measure, to compare small facial areas. Experiments have been carried out on the YaleB, ex- tended YaleB and eMU-PIE facial illumination databases. Facial identification accuracy using PCIlVI is comparable to or exceeds that of the literature.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available