Use this URL to cite or link to this record in EThOS:
Title: Clustering ensemble method
Author: Alqurashi, Tahani
ISNI:       0000 0004 6058 8571
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
Clustering is an unsupervised learning paradigm that partitions a given dataset into clusters so that objects in the same cluster are more similar to each other than to the objects in the other clusters. However, when clustering algorithms are used individually, their results are often inconsistent and unreliable. This research applies the philosophy of Ensemble learning that combines multiple partitions using a consensus function in order to address these issues to improve a clustering performance. A clustering ensemble framework is presented consisting of three phases: Ensemble Member Generation, Consensus and Evaluation. This research focuses on two points: the consensus function and ensemble diversity. For the first, we proposed three new consensus functions: the Object-Neighbourhood Clustering Ensemble (ONCE), the Dual-Similarity Clustering Ensemble (DSCE), and the Adaptive Clustering Ensemble (ACE). ONCE takes into account the neighbourhood relationship between object pairs in the similarity matrix, while DSCE and ACE are based on two similarity measures: cluster similarity and membership similarity. The proposed ensemble methods were tested on benchmark real-world and artificial datasets. The results demonstrated that ONCE outperforms the other similar methods, and is more consistent and reliable than k-means. Furthermore, DSCE and ACE were compared to the ONCE, CO, MCLA and DICLENS clustering ensemble methods. The results demonstrated that on average ACE outperforms the state-of-the-art clustering ensemble methods, which are CO, MCLA and DICLENS. On diversity, we experimentally investigated all the existing measures for determining their relationship with the ensemble quality. The results indicate that none of them are capable of discovering a clear relationship and the reasons for this are: (1) they all are inappropriately defined to measure the useful difference between the members, and (2) none of them have been used directly by any consensus function. Therefore, we point out that these two issues need to be addressed in future research.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available