Use this URL to cite or link to this record in EThOS:
Title: Cognitive vision systems for video understanding and retrieval
Author: Kolonias, I.
ISNI:       0000 0001 3601 5894
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis addresses the problem of creating computer vision systems that will facilitate high-level, user-friendly interpretation of an observed scene, and which will be readily adaptable to a wide range of computer vision tasks. Hence, the notion of injecting cognitive capabilities to traditional computer vision systems is central to this work. Initially, the requirements of creating a cognitive vision system will be examined. This will lead us to the conclusion that the two main enabling components for such systems are the following: a unified framework for reasoning in the context of the observed scene; and a multi-layered memory architecture that will aid the reasoning framework in recalling and storing all relevant information about the observed scene. Regarding the apparatus used for reasoning in video sequences, it will be argued that it must be characterised by its ability to be applied at all levels of information processing (from raw input data to high-level abstractions concerning the evolution of the observed scene), support and exploit any combination of spatial and temporal dependencies (i.e. context) present among the input data, and deliver good reasoning performance when applied at any categorical domain. On the other hand, the requirements the reasoning engine sets will be used as a guideline for the design of a memory architecture conducive to the former. Therefore, the latter must be able to handle arbitrary input data types, depending on the scope of the current cognition task. It must also allow for both forward and feedback interaction with the reasoning framework, as contextual information extracted from the observed scene at a later stage may assist the reasoning engine in altering a decision made in previous stages - just like humans do when presented with contradicting evidence. To further emulate the mechanisms that enable human cognition, forgetting processes were also embedded in the memory infrastructure. For this particular feature, different layers of memory storage facilitate forgetting at different speeds; the system forgets raw input and low-level feature data very quickly, whereas high-level concepts about the evolution of the observed scene are retained over relatively long term. Finally, the overall proposed system has been implemented and tested on a real-world application - the annotation of broadcast tennis video sequences. In this sample application, the goal was to create a cognitive vision system that would keep track of the score for the duration of the broadcast match, based on the main components described above. The results obtained from processing a set of sequences captured off-the-air indicate that the overall approach achieves far superior results to simply segmenting the video sequence into shots and analysing each one separately, taken out of the context of the match. This demonstrates that the ability to adapt by discovering and exploiting context is paramount to the efficiency of any future computer vision system, and is, in no small part, a feature that sets biological cognitive vision systems apart from their machine-based counterparts.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available