Use this URL to cite or link to this record in EThOS:
Title: Structured machine learning methods for automated analysis of facial expressions
Author: Walecki, Robert
ISNI:       0000 0004 9356 6950
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Automated recognition of facial expressions, and detection of facial action units (AUs) from videos depends critically on modeling of their dynamics. Some of these dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion expressions and AUs. The appearance of these changes may vary considerably among subjects, making the recognition/detection task very challenging. Recent advances in deep neural networks (DNN) and, in particular, convolutional models have facilitated “end-to-end” learning and reduced or even completely eliminated the dependence and need for physics-based models and/or other pre-processing techniques. While the effect- iveness of these models has been demonstrated on many computer vision problems, only baseline tasks such as expression recognition, AU detection and AU intensity estimation have been investigated. The structure of facial expressions arises from statistically induced co-occurrence patterns of AU intensity levels. Our goal is to model this structure by combining conditional random fields (CRF) with deep learning. The contribution of this thesis is two-fold. First, we introduce a novel Latent-CRF model for classification of image sequences. Second, we propose a deep probabilistic framework for modeling multivariate ordinal variables. Latent-CRFs efficiently encode dynamics through latent states accounting for temporal consistency. These latent states are typically assumed to be either unordered (nominal) or fully ordered (ordinal). Yet, while the video segments containing activation of the target AU may better be described using ordinal latent states (corresponding to the AU intensity levels), the segments where this AU does not occur, may better be described using unordered (nominal) latent states. To address this, we propose the Variable-state L-CRF model that automatically selects the optimal latent states for the target image sequence, based on the input data and underlying dynamics of the sequence. The deep probabilistic framework introduced in the second part of this thesis accounts for ordinal structure in the output variables and their non-linear dependencies via Copula functions modeled as cliques of a CRF. These are jointly optimized with deep CNN feature encoding layers using a newly introduced balanced batch iterative training algorithm. We show that joint learning of the deep features and the target output structure results in significant performance gains compared to existing deep structured models for analysis of facial expressions. We show that the proposed models consistently outperforms (i) independent modeling of AU intensities and (ii) the state-of-the-art approach for the target task and (iii) deep convolutional neural networks.
Supervisor: Pantic, Maja Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral