Title:
|
Scale, saliency and scene description
|
This thesis develops a novel information theoretic methodology addressing three intrinsically
related problems in vision: saliency, scale and description. The fundamental principle underpinning the proposed approach is the spatial (un)predictability of image attributes.
The thesis is concerned with the Scene Description task — the automatic extraction
of a set of robust, relevant, and sufficiently complete semantic descriptions of a scene, for subsequent inference. This task is essential for any application where the efficient and semantic representation of image-based data is necessary, for example data-mining and communications systems. The main challenge in this task is to extract the descriptions without assumptions on the exact nature of the subsequent inferences. Clearly, without
prior knowledge the problem is intractable; this thesis addresses the questions : “how much is needed and at what stage need it be applied?”
Many approaches to vision tend to concentrate on specific scene entities (or ‘objects’),
and hence do not capture a complete description. Those that do tend to be brittle and lack the necessary semantic level of description. Motivated by the work of Gilles (1998) and recent successes of local appearance-based methods, a novel algorithm, called Scale Saliency, for quantifying image region saliency is presented. In this new approach, regions are considered salient if they are simultaneously unpredictable both in some feature and
scale-space. Unpredictability is determined as a function of the local PDF, generating a space of saliency values in R3 (x, y and scale), from which features may be extracted by a suitable detection strategy. The technique is a more generic approach to saliency
compared to conventional methods, because saliency is defined independent of a particular basis morphology. The method can be made invariant to rotation, translation, non-uniform scaling, and uniform intensity variations and robust to small changes in viewpoint. The algorithm is applied to simple recognition tasks and the features shown to be robust
and persistent (hence useful for tracking). The relevance of the scales and the generality of saliency is demonstrated by using the PDF of salient scales to characterise textures;
classification and unsupervised segmentation results are presented. A ‘by-product’ of this work is that salient scales themselves make good descriptors of texture. For the texture segmentation experiments, a novel unsupervised Level Set based implementation of Region
Competition is developed. The key aspect of this is that it operates on just one surface. Generalised Region Competition evolution equations are presented.
Finally, a unified approach to image modelling is proposed based on two scales of spatial unpredictability — the local and the semi-local. Quantifying the unpredictability of image attributes at these two scales enables a space of image models that can represent several
different image content types such as blobs, lines, statistical and structural textures in a unified framework.
|