Use this URL to cite or link to this record in EThOS:
Title: Machine learning methods for face modelling and analysis in-the-wild
Author: Kossaifi, Jean
ISNI:       0000 0004 7659 0297
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Automatic facial analysis is at the intersection between computer vision and machine learning. It consists of two main steps. First, facial alignment, which typically consists of the detection of a set of fiducial points, or landmarks, on the face. Secondly, the aligned faces are then used, either directly as pixels intensity, or after extracting more robust features (either hand-crafted ones such as histograms of oriented gradients, or learned with a deep neural network), as input to estimate emotional states. In this thesis, we develop a complete pipeline for facial analysis in real-life, naturalistic conditions (in-the-wild), covering both steps. We first explore generative models for the task of facial alignment, Active Appearance Models (AAMs). Specifically, we introduce a new second order method for fitting AAMs. We then introduce a bidirectional method that simultaneously deforms the model and the image, leading to faster convergence. In both cases, we leverage the structure in the problem to obtain exact solutions with better computational complexity. We show that, when trained in-the-wild, they achieve state- of-the-art performance, while requiring smaller datasets than discriminative methods. We also demonstrate how to leverage the statistical shape model and motion model from AAMs to constrain generative adversarial networks. We then build on the facial alignment framework to estimate dimensional measures of emotion. Specifically, we estimate continuous levels of valence (how positive or negative a state of mind is) and arousal (how exciting or calming the experience is). To do so, we introduce a new database of images collected in-the-wild, and annotated per-frame in terms of continuous levels of valence and arousal, along with accurate facial landmarks. We then demonstrate the importance of training models on data collected in-the-wild as opposed to existing databases, mainly collected in laboratory, or controlled environments. While developing tools for better facial analysis, it became clear that, while the data we work with has a rich multi-linear structure (e.g. spatial and temporal), this is discarded by current methods. We therefore endeavoured in devising new methods able to leverage that structure. In particular, given the absence of software for tensor methods, we created TensorLy, a high level API for tensor algebra, decomposition and regression in Python. Its flexible backend system makes it possible to seamlessly run computation on various hardware with several libraries, including deep learning libraries such as PyTorch, Tensor- Flow or MXNet. This allowed us to introduce new ways of combining tensor methods with deep learning, such as tensor contraction and regression layers. This type of hybrid method combines the power of tensor algebra with the efficiency of deep learning. It makes it possible to devise efficient algorithms that achieve state-of-the-art performance and are scalable to very large datasets, while enabling large parameter space savings.
Supervisor: Pantic, Maja Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral