Use this URL to cite or link to this record in EThOS:
Title: Reinforcing soft independent modelling of class analogy (SIMCA)
Author: Zhu, R.
ISNI:       0000 0004 8499 7767
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Soft independent modelling of class analogy (SIMCA) is a widely used subspacebased classification technique for spectral data analysis. The principal component (PC) subspace is built for each class separately through principal components analysis (PCA). The squared orthogonal distance (OD2) between the test sample and the class subspace of each class, and the squared score distance (SD2) between the projection of the test sample to the class subspace and the centre of the class subspace, are usually used in the classification rule of SIMCA to classify the test sample. Although it is commonly used to classify high-dimensional spectral data, SIMCA suffers from several drawbacks and some misleading calculations in literature. First, modelling classes separately makes the discriminative between-class information neglected. Second, the literature of SIMCA fail to explore the potential benefit of using geometric convex class models, whose superior classification performance has been demonstrated in face recognition. Third, based on our experiments on several real datasets, calculating OD2 using the formulae in a highlycited SIMCA paper (De Maesschalck et al., 1999) results in worse classification performance than using those in the original SIMCA paper (Wold, 1976) for some high-dimensional data and provides misleading classification results. Fourth, the distance metrics used in the classification rule of SIMCA are predetermined, which are not adapted to different data. Hence the research objectives of my PhD work are to reinforce SIMCA from the following four perspectives: O1) to make its feature space more discriminative; O2) to use geometric convex models as class models in SIMCA for spectral data classification and to study the classification mechanism of classification using different class models; O3) to investigate the equality and inequality of the calculations of OD2 in De Maesschalck et al. (1999) and Wold (1976) for low-dimensional and high-dimensional scenarios; and O4) to make its distance metric adaptively learned from data. In this thesis, we present four contributions to achieve the above four objectives, respectively: First, to achieve O1), we propose to first project the original data to a more discriminative subspace before applying SIMCA. To build such discriminative subspace, we propose the discriminatively ordered subspace (DOS) method, which selects the eigenvectors of the generating matrix with high discriminative ability between classes to span DOS. A paper of this work, "Building a discriminatively ordered subspace on the generating matrix to classify high-dimensional spectral data", has been recently published by the journal of "Information Sciences". Second, to achieve O2), we use the geometric convex models, convex hull and convex cone, as class models in SIMCA to classify spectral data. We study the dual of classification methods using three class models: the PC subspace, convex hull and convex cone, to investigate their classification mechanism. We provide theoretical results of the dual analysis, establish a separating hyperplane classification (SHC) framework and provide a new data exploration scheme to analyse the properties of a dataset and why such properties make one or more of the methods suitable for the data. Third, to achieve O3), we compare the calculations of OD2 in De Maesschalck et al. (1999) and Wold (1976). We show that the corresponding formulae in the two papers are equivalent, only when the training data of one class have more samples than features. When the training data of one class have more features than samples (i.e. high-dimensional), the formulae in De Maesschalck et al. (1999) are not precise and affect the classification results. Hence we suggest to use the formulae in Wold (1976) to calculate OD2, to get correct classification results of SIMCA for highdimensional data. Fourth, to achieve O4), we learn the distance metrics in SIMCA based on the derivation of a general formulation of the classification rules used in literature. We define the general formulation as the distance metric from a sample to a class subspace. We propose the method of learning distance to subspace to learn this distance metric by making the samples to be closer to their correct class subspaces while be farther away from their wrong class subspaces. Lastly, at the end of this thesis we append two pieces of work on hyperspectral image analysis. First, the joint paper with Mr Mingzhi Dong and Dr Jing-Hao Xue, "Spectral Nonlocal Restoration of Hyperspectral Images with Low-Rank Property", has been published by the IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. Second, the joint paper with Dr Fei Zhou and Dr Jing-Hao Xue, "MvSSIM: A Quality Assessment Index for Hyperspectral Images", has been in revision for Neurocomputing. As these two papers do not focus on the research objectives of this thesis, they are appended as some additional work during my PhD study.
Supervisor: Xue, J. H. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available