Use this URL to cite or link to this record in EThOS:
Title: Learning a structured model for visual category recognition
Author: Gupta, Ashish
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2013
Availability of Full Text:
Access through EThOS:
This thesis deals with the problem of estimating structure in data due to the semantic relations between data. elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct set of prototypes from training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides 'variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea. in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model. Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several subspace embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to issues of uncertainty and plausibility in the assignment of descriptors to dictionary elements. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits. To estimate structure amongst sub-spaces: co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as 'topics' . A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the co-clustered groups of sub-spaces and dictionary elements have semantic relevance. All the methods developed here have been viewed from the unifying perspective of matrix factorization: where a data matrix is decomposed to two matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the dictionary or co-efficient matrices. With regards to sub-space embedding, the sparse principal component analysis is one such method that induces sparsity amongst the sub-spaces selected to represent each descriptor. Similarly, a sparsity inducing regularization method called Lasso is used for feature encoding, which uses only a sub-set of dictionary elements to represent each image. While these methods are effective, they disregard structure in the data matrix.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available