Use this URL to cite or link to this record in EThOS:
Title: Action recognition using deep learning
Author: Palasek, Petar
ISNI:       0000 0004 7653 2839
Awarding Body: Queen Mary University of London
Current Institution: Queen Mary, University of London
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
In this thesis we study deep learning architectures for the problem of human action recognition in image sequences, i.e. the problem of automatically recognizing what people are doing in a given video. As unlabeled video data is easily accessible these days, we first explore models that can learn meaningful representations of sequences without actually having to know what is happening in the sequences at hand. More specifically, we first explore the convolutional restricted Boltzmann machine (RBM) and show how a stack of convolutional RBMs can be used to learn and extract features from sequences in an unsupervised way. Using the classical Fisher vector pipeline to encode the extracted features we apply them on the task of action classification. We move on to feature extraction using larger, deep convolutional neural networks and propose a novel architecture which expresses the processing steps of the classical Fisher vector pipeline as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, defining them as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. We show that our method achieves significant improvements in comparison to the classical Fisher vector extraction chain and results in a comparable performance to other convolutional networks, while largely reducing the number of required trainable parameters. Finally, we explore how the proposed architecture can be modified into a hybrid network that combines the benefits of both unsupervised and supervised training methods, resulting in a model that learns a semi-supervised Fisher vector descriptor of the input data. We evaluate the proposed model at image classification and action recognition problems and show how the model's classification performance improves as the amount of unlabeled data increases during training.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Electronic Engineering and Computer Science ; deep learning architectures ; human action recognition