Use this URL to cite or link to this record in EThOS:
Title: Unsupervised analysis of behaviour dynamics
Author: Zafeiriou, Lazaros
ISNI:       0000 0004 6348 1921
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Human facial behaviour analysis is an important task in developing automatic Human-Computer Interaction systems, having received rapidly increased attention over the past two decades. Dynamics of facial behaviour convey important information (e.g., discriminating posed to spontaneous expressions) and remain up to date a quite unexploited field. This thesis presents machine learning algorithms that focus on solving the relatively unexplored problem of extracting features that can efficiently and effectively capture the temporal dynamics of the behaviour, and can hence be also used for temporal alignment. The proposed methods are all unsupervised, i.e. they do not exploit any label information. The motivation behind the development of unsupervised algorithms lies in the fact that labelled/annotated data are really hard to obtain, since annotating behaviour dynamics is a very time demanding, expensive and labour intensive procedure. Additionally, in these models we incorporate temporal alignment enabling a joint temporal decomposition of two or more time-series into a common expression manifold by employing either low-dimensional sets of landmarks or raw pixel intensities. This is a challenging problem for many scientific disciplines in which the observation samples need to be aligned in time. In particular, this is mainly significant in terms of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. The methods that we propose for capturing the dynamics of facial expressions use Component Analysis (CA) which is a fundamental step in most computer vision applications, especially in terms of reducing the usually high-dimensional input data in a meaningful manner by preserving a certain function. These CA methodologies can be distinguished in deterministic and probabilistic techniques. In deterministic CA, the noise cannot be modelled and these methods they do not provide prior information. On the other hand, probabilistic CA is a very powerful framework that naturally allows the incorporation of noise and a-priori knowledge in the developed models. A significant contribution of our work lies in proposing an Expectation Maximization (EM) algorithm for performing inference in a probabilistic formulation of Slow Feature Analysis (SFA) and extending it in order to handle more than one time varying data sequences. Moreover, we demonstrate that the probabilistic SFA (EM-SFA) algorithm that discovers the common slowest varying latent space of multiple sequences can be combined with Dynamic Time Warping (DTW) techniques for robust sequence time-alignment. Most of the unsupervised learning techniques such as Principal Components Analysis (PCA) enforce only a weak orthogonality constraint, resulting in a very distributed representation that uses cancellations to generate variability. This results to a holistic representation which makes the latent features difficult to be interpreted. For alleviating this, a group of unsupervised learning algorithms known as Non-negative Matrix Factorization (NMF), have been proposed. These algorithms enforce non-negativity constraints resulting to a part-based representation, since they allow only additive and not subtractive combinations. Another major contribution of this thesis lies in proposing a model that combines the properties of temporal slowness and nonnegative parts-based learning into a common framework that aims to learn slow varying parts-based representations of time varying sequences. The proposed representations can be used in order to capture the underlying dynamics of temporal phenomena such as facial behaviour. Furthermore, we extend the above framework in order to align two visual sequences that display the same dynamic phenomenon by proposing a novel joint NMF. The proposed framework enables a joint temporal decomposition of two non-negative time-series into a non-negative shared latent space, where they can be temporally aligned. The proposed method is tailored for the temporal alignment of facial events since it is able to discover the facial parts that are jointly activated in the sequences along with their temporal activation envelope. We demonstrate the power of the proposed decompositions in unsupervised analysis of dynamic visual phenomena, as well as temporal alignment of facial behaviour. The predominant strategy for facial expression analysis and temporal analysis of facial events is the following: a generic facial landmarks tracker, usually trained on thousands of carefully annotated examples, is applied to track the landmark points, and then analysis is performed using mostly the shape and more rarely the facial texture. In this thesis, we challenge the above framework by showing that is feasible to perform joint landmarks localization and temporal analysis of behavioural sequence with the use of a simple face detector and a simple shape model. To this end, we formulate a generative model which jointly describes the data and also captures temporal dependencies by incorporating an autoregressive chain in the latent space. We also extend this model by integrating temporal alignment process in order to align two unsynchronized sequences of observations displaying highly deformable texture-varying objects. The resulting model is the first to perform simultaneous spatial and temporal alignment showing that by treating the problems of deformable spatial and temporal alignment jointly, we achieve better results than considering the problems independent.
Supervisor: Pantic, Maja Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral