Use this URL to cite or link to this record in EThOS:
Title: Shot descriptors for video temporal decomposition
Author: Sidiropoulos, Panagiotis
ISNI:       0000 0004 2745 5793
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Access from Institution:
Video temporal decomposition is an essential element of a variety of video processing applications, from semantic indexing and classification to non-linear browsing, video summarization and video retrieval. The decomposition is traditionally conducted using shots as the video structural units. However, while shots are video segments that can be explicitly defined, they lack semantic meaning. On the other hand, scenes, which are generally defined as the elementary semantic video units, are expected to generate more meaningful video representations and to enhance the performance of video processing applications that employ temporal decomposition. However, before replacing shot with scene segmentation the latter need to reach the high performance levels of the former. This thesis aims to provide directions towards this goal, first by identifying some of the main current limitations of video scene segmentation and next by suggesting ways to overcome them. More specifically, four main restraints have been identified. Firstly, the ambiguity in the definition of what a scene is, which is an inherent domain characteristic. The general scene definition as the elementary semantic unit finds various interpretations depending on the video genre, the application etc. Next, the semantic gap between what makes two shots belong to the same scene and the available scene descriptors. Indeed, the scenes are formed by links between pairs of neighboring shots that are similar in content. The shot content similarity cant be efficiently modeled by low-level descriptors, which are typically used by the community for this purpose. Additionally, the limited scalability of the existing scene segmentation algorithms. As a matter of fact, it seems to be difficult to generalize and efficiently tune scene segmentation approaches not only for videos of multiple genres but also for a small number of videos from the same genre. Finally, the lack of a uni-dimensional evaluation measure that would efficiently gauge the performance of an automatic scene segmentation system. This thesis includes the development of a novel approach to evaluating video temporal decomposition algorithms, which is not only effective in evaluating scene segmentation techniques and in helping to optimize their parameters, but also satisfies a number of qualitative prerequisites that previous measures do not. Furthermore, the novel measure is proven to be a metric, which is a property that can be used to alleviate the effects of the scene definition ambiguity. Subsequently, a scheme that fully exploits the scene discrimination potential of shot descriptors deriving both from visual and audio modality is presented, followed by the introduction of a number of novel shot descriptors. These employ high-level features automatically extracted from the visual and the auditory channel, which are shown to be able to contribute towards improved video segmentation to scenes. Finally, conclusions and future work complete this thesis.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available