Use this URL to cite or link to this record in EThOS:
Title: Video sequence alignment
Author: Al Ghamdi, Manal
ISNI:       0000 0004 5346 7050
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
The task of aligning multiple audio visual sequences with similar contents needs careful synchronisation in both spatial and temporal domains. It is a challenging task due to a broad range of contents variations, background clutter, occlusions, and other factors. This thesis is concerned with aligning video contents by characterising the spatial and temporal information embedded in the high-dimensional space. To that end a three- stage framework is developed, involving space-time representation of video clips with local linear coding, followed by their alignment in the manifold embedded space. The first two stages present a video representation techniques based on local feature extraction and linear coding methods. Firstly, the scale invariant feature transform (SIFT) is extended to extract interest points not only from the spatial plane but also from the planes along the space-time axis. Locality constrained coding is then incorporated to project each descriptor into a local coordinate system produced by a pooling technique. Human action classification benchmarks are adopted to evaluate these two stages, comparing their performance against existing techniques. The results shows that space-time extension of SIFT with a linear coding scheme outperforms most of the state-of-the-art approaches on the action classification task owing to its ability to represent complex events in video sequences. The final stage presents a manifold learning algorithm with spatio-temporal constraints to embed a video clip in a lower dimensional space while preserving the intrinsic geometry of the data. The similarities observed between frame sequences are captured by defining two types of correlation graphs: an intra-correlation graph within a single video sequence and an inter-correlation graph between two sequences. A video retrieval and ranking tasks are designed to evaluate the manifold learning stage. The experimental outcome shows that the approach outperforms the conventional techniques in defining similar video contents and capture the spatio-temporal correlations between them.
Supervisor: Gotoh, Yoshihiko Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available