Use this URL to cite or link to this record in EThOS:
Title: Spatio-temporal framework on facial expression recognition
Author: Fan, Xijian
ISNI:       0000 0004 6351 0147
Awarding Body: University of Warwick
Current Institution: University of Warwick
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis presents an investigation into two topics that are important in facial expression recognition: how to employ the dynamic information from facial expression image sequences and how to efficiently extract context and other relevant information of different facial regions. This involves the development of spatio-temporal frameworks for recognising facial expression. The thesis proposed three novel frameworks for recognising facial expression. The first framework uses sparse representation to extract features from patches of a face to improve the recognition performance, where part-based methods which are robust to image alignment are applied. In addition, the use of sparse representation reduces the dimensionality of features, and improves the semantic meaning and represents a face image more efficiently. Since a facial expression involves a dynamic process, and the process contains information that describes a facial expression more effectively, it is important to capture such dynamic information so as to recognise facial expressions over the entire video sequence. Thus, the second framework uses two types of dynamic information to enhance the recognition: a novel spatio-temporal descriptor based on PHOG (pyramid histogram of gradient) to represent changes in facial shape, and dense optical flow to estimate the movement (displacement) of facial landmarks. The framework views an image sequence as a spatio-temporal volume, and uses temporal information to represent the dynamic movement of facial landmarks associated with a facial expression. Specifically, spatial based descriptor representing spatial local shape is extended to spatio-temporal domain to capture the changes in local shape of facial sub-regions in the temporal dimension to give 3D facial component sub-regions of forehead, mouth, eyebrow and nose. The descriptor of optical flow is also employed to extract the information of temporal. The fusion of these two descriptors enhance the dynamic information and achieves better performance than the individual descriptors. The third framework also focuses on analysing the dynamics of facial expression sequences to represent spatial-temporal dynamic information (i.e., velocity). Two types of features are generated: a spatio-temporal shape representation to enhance the local spatial and dynamic information, and a dynamic appearance representation. In addition, an entropy-based method is introduced to provide spatial relationship of different parts of a face by computing the entropy value of different sub-regions of a face.
Supervisor: Not available Sponsor: School of Engineering ; University of Warwick ; China Scholarship Council (CSC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: TA Engineering (General). Civil engineering (General)