Use this URL to cite or link to this record in EThOS:
Title: Nonnegative matrix analysis for data clustering and compression
Author: Gong, Liyun
ISNI:       0000 0004 5356 9399
Awarding Body: University of Liverpool
Current Institution: University of Liverpool
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Nonnegative matrix factorization (NMF) has becoming an increasingly popular data processing tool these years, widely used by various communities including computer vision, text mining and bioinformatics. It is able to approximate each data sample in a data collection by a linear combination of a set of nonnegative basis vectors weighted by nonnegative weights. This often enables meaningful interpretation of the data, motivates useful insights and facilitates tasks such as data compression, clustering and classification. These subsequently lead to various active roles of NMF in data analysis, e.g., dimensionality reduction tool [11, 75], clustering tool[94, 82, 13, 39], feature engine [40], source separation tool [38], etc. Different methods based on NMF are proposed in this thesis: The modification of k- means clustering is chosen as one of the initialisation methods for NMF. Experimental results demonstrate the excellence of this method with improved compression performance. Independent principal component analysis (IPCA) which combines the advantage of both principal component analysis (PCA) and independent component analysis (ICA) has been chosen as the significant initialisation method for NMF with improved clustering accuracy. We have proposed the new evolutionary optimization strategy for NMF driven by three proposed update schemes in the solution space, saying NMF rule (or original movement), firefly rule (or beta movement) and survival of the fittest rule (or best movement). This proposed update strategy facilitates both the clustering and compression problems by using the different system objective functions that make use of the clustering and compression quality measurements. A hybrid initialisation approach is used by including the state-of-the-art NMF initialization methods as seed knowledge to increase the rate of convergence. There is no limitation for the number and the type of the initialization methods used for the proposed optimisation approach. Numerous computer experiments using the benchmark datasets verify the theoretical results, make comparisons among the techniques in measures of clustering/compression accuracy. Experimental results demonstrate the excellence of these methods with im- proved clustering/compression performance. In the application of EEG dataset, we employed several standard algorithms to provide clustering on preprocessed EEG data. We also explored ensemble clustering to obtain some tight clusters. We can make some statements based on the results we have got: firstly, normalization is necessary for this EEG brain dataset to obtain reasonable clustering; secondly, k-means, k-medoids and HC-Ward provide relatively better clustering results; thirdly, ensemble clustering enables us to tune the tightness of the clusters so that the research can be focused.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Q Science (General) ; T Technology (General)