Use this URL to cite or link to this record in EThOS:
Title: A class of distance-preserving matrix optimization models in data mining
Author: Jahan, Sohana
ISNI:       0000 0004 6348 9608
Awarding Body: University of Southampton
Current Institution: University of Southampton
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
In this thesis we are concerned to work on a class of matrix optimization problems. A matrix optimization problem (MOP) involves optimizing the sum of a linear function and a proper closed simple convex function subject to affine constraints in the matrix space. Many important optimization problems in various applications such as data mining, network localization, etc arising from a wide range of fields such as engineering, finance and so on, can be cast in the form of MOPs. This thesis is focused on the application of MOPs in data mining specially on data visualization, regression and classification. Data mining is the process of discovering interesting patterns and knowledge where different approaches (eg. dimension reduction) are applied to pre-process the data smoothing out noises. Dimensionality reduction is a traditional problem in pattern recognition and machine learning. A wide number of methods are used to project high dimensional data into low dimensional space so that the result performs better for further processing such as regression, classification, clustering etc. The classical Multi-Dimensional Scaling (cMDS) is an important method for data dimension reduction and therefore for assigning them into fixed number of classes. Nonlinear variants of cMDS have been developed to improve its performance. One of them is the MDS with Radial Basis Functions (RBF). A key issue that has not been well addressed in MDS-RBF is the effective selection of its centers. Proper selection of centers leads to better classification of the data. This research treats this selection problem as a multi-task learning problem, which leads us to employ the (2; 1)-norm to regularize the original MDS-RBF objective function. Two reformulations: Diagonal and spectral reformulations have been studied. Both can be effectively solved through an iterative block-majorization method. Numerical experiments show that the regularized models can improve the original model significantly. Though working very fast for small data set, these models are little time consuming for large data set. So we were seeking for a model that will project the large data efficiently. Supervised distance preserving projection method (SDPP) is a very efficient method proposed recently for dimension reduction in supervised settings. Basic formulation of SDPP aims to preserve distances locally between data points in the projected space (reduced feature space ) and the output space. In our work we proposed a modification of SDPP which incorporates the total variance of the projected co-variates to the SDPP problem. We formulated the proposed optimization problem as a Semidefinite Least Square (SLS) SDPP. The SLS-SDPP maximizes the total variance of the projected co-variates and preserves the local geometry of the output space as well. A two block Alternating Direction Method of Multipliers have been developed to learn the transformation matrix solving the SLS-SDPP which can easily handle out of sample data. The projections of testing data points in low dimensional space are further used for regression or classifying them into different classes. The experimental evaluation on both synthetic and real world data demonstrates that SLS-SDPP improves SDPP significantly, outperforms some other state-of-the-art approaches and can be applied to any higher dimensional large data set. Finally SLS-SDPP is applied on some very well known face recognition problems. Satisfactory performance of our proposed dimension reduction method compared to some leading approaches in this area signify the applicability of our model to a wide range of image recognition problems.
Supervisor: Qi, Houduo ; Fliege, Joerg Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available