Use this URL to cite or link to this record in EThOS:
Title: Novel image representations for visual categorisation with 'Bag-of-Words'
Author: Koniusz, Piotr
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Access from Institution:
Visual Category Recognition aims at fast classification of objects, as well as scenery, action, and semantically complex concepts in collections of unannotated images. Its applications include security and crime prevention, rapid selection of content for efficient media practices, television and press archives, organisation of visual content in the social media, e-commerce, robotic recognition, and many more. There exist a variety of approaches to visual categorisation. However, due to complex nature of visual appearances and complex taxonomy of objects, a simplifying statistical model developed for natural language processing, called Bag-of-Words, is typically used. In such a model, descriptors are extracted from images at keypoint locations and then expressed as vectors representing visual word appearances, referred to as mid-level features. A pooling step is carried out to transform mid-level features from an image into a final vectorial representation called image signature. Finally, a classifier is applied. Segmentation-based interest points for matching and recognition are investigated first. Two simple methods for extracting features from the segmentation maps are proposed. They focus on the boundaries and centres of the gravity of the segments. Segmentation-based image descriptors are proposed next. They are extracted from pairs of adjacent regions from an unsupervised segmentation. Thus, semi-local structural appearances are exploited. This limits contribution of uniform regions. A highly popular technique for coding the local image descriptors in Bag-of-Words, called Soft Assignment, is combined with Linear Coordinate Coding to minimise its quantisation loss which strongly correlates with the best classification performance. An approach that introduces spatial information to Bag-of-Words, called Spatial Coordinate Coding is proposed. It reduces the size of mid-level features tenfold. Moreover, as dominant orientations of edges and colour are sources of bias in images, we learn them at multiple levels of coarseness by Dominant Angle and Colour Pyramid Matching. A number of techniques for generating mid-level features as well as various pooling methods that aggregate mid-level features into image signatures are investigated. We generalise these pooling methods to account for the descriptor interdependence and introduce an improved pooling that addresses noise effects in mid-level features. Bag-of-Words typically extract the first-order statistics from mid-level features. To improve recognition, aggregation over co-occurrences of visual words in mid-level features is proposed. An appropriate derivation is provided and various likelihood inspired pooling operators investigated. Moreover, an extension to multiple modalities is proposed.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available