Use this URL to cite or link to this record in EThOS:
Title: Feature reduction and representation learning for visual applications
Author: Yu, Mengyang
ISNI:       0000 0004 6352 4995
Awarding Body: Northumbria University
Current Institution: Northumbria University
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Computation on large-scale data spaces has been involved in many active problems in computer vision and pattern recognition. However, in realistic applications, most existing algorithms are heavily restricted by the large number of features, and tend to be inefficient and even infeasible. In this thesis, the solution to this problem is addressed in the following ways: (1) projecting features onto a lower-dimensional subspace; (2) embedding features into a Hamming space. Firstly, a novel subspace learning algorithm called Local Feature Discriminant Projection (LFDP) is proposed for discriminant analysis of local features. LFDP is able to efficiently seek a subspace to improve the discriminability of local features for classification. Extensive experimental validation on three benchmark datasets demonstrates that the proposed LFDP outperforms other dimensionality reduction methods and achieves state-of-the-art performance for image classification. Secondly, for action recognition, a novel binary local representation for RGB-D video data fusion is presented. In this approach, a general local descriptor called Local Flux Feature (LFF) is obtained for both RGB and depth data by computing the local fluxes of the gradient fields of video data. Then the LFFs from RGB and depth channels are fused into a Hamming space via the Structure Preserving Projection (SPP), which preserves not only the pairwise feature structure, but also a higher level connection between samples and classes. Comprehensive experimental results show the superiority of both LFF and SPP. Thirdly, in respect of unsupervised learning, SPP is extended to the Binary Set Embedding (BSE) for cross-modal retrieval. BSE outputs meaningful hash codes for local features from the image domain and word vectors from text domain. Extensive evaluation on two widely-used image-text datasets demonstrates the superior performance of BSE compared with state-of-the-art cross-modal hashing methods. Finally, a generalized multiview spectral embedding algorithm called Kernelized Multiview Projection (KMP) is proposed to fuse the multimedia data from multiple sources. Different features/views in the reproducing kernel Hilbert spaces are linearly fused together and then projected onto a low-dimensional subspace by KMP, whose performance is thoroughly evaluated on both image and video datasets compared with other multiview embedding methods.
Supervisor: Shao, Ling Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: G400 Computer Science