Use this URL to cite or link to this record in EThOS:
Title: Multi-objective evolutionary algorithms for data clustering
Author: Kirkland, Oliver
ISNI:       0000 0004 5352 2689
Awarding Body: University of East Anglia
Current Institution: University of East Anglia
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
In this work we investigate the use of Multi-Objective metaheuristics for the data-mining task of clustering. We �first investigate methods of evaluating the quality of clustering solutions, we then propose a new Multi-Objective clustering algorithm driven by multiple measures of cluster quality and then perform investigations into the performance of different Multi-Objective clustering algorithms. In the context of clustering, a robust measure for evaluating clustering solutions is an important component of an algorithm. These Cluster Quality Measures (CQMs) should rely solely on the structure of the clustering solution. A robust CQM should have three properties: it should be able to reward a \good" clustering solution; it should decrease in value monotonically as the solution quality deteriorates and, it should be able to evaluate clustering solutions with varying numbers of clusters. We review existing CQMs and present an experimental evaluation of their robustness. We find that measures based on connectivity are more robust than other measures for cluster evaluation. We then introduce a new Multi-Objective Clustering algorithm (MOCA). The use of Multi-Objective optimisation in clustering is desirable because it permits the incorporation of multiple measures of cluster quality. Since the definition of what constitutes a good clustering is far from clear, it is beneficial to develop algorithms that allow for multiple CQMs to be accommodated. The selection of the clustering quality measures to use as objectives for MOCA is informed by our previous work with internal evaluation measures. We explain the implementation details and perform experimental work to establish its worth. We compare MOCA with k-means and find some promising results. We�find that MOCA can generate a pool of clustering solutions that is more likely to contain the optimal clustering solution than the pool of solutions generated by k-means. We also perform an investigation into the performance of different implementations of MOEA algorithms for clustering. We�find that representations of clustering based around centroids and medoids produce more desirable clustering solutions and Pareto fronts. We also �find that mutation operators that greatly disrupt the clustering solutions lead to better exploration of the Pareto front whereas mutation operators that modify the clustering solutions in a more moderate way lead to higher quality clustering solutions. We then perform more specific investigations into the performance of mutation operators focussing on operators that promote clustering solution quality, exploration of the Pareto front and a hybrid combination. We use a number of techniques to assess the performance of the mutation operators as the algorithms execute. We confirm that a disruptive mutation operator leads to better exploration of the Pareto front and mutation operators that modify the clustering solutions lead to the discovery of higher quality clustering solutions. We find that our implementation of a hybrid mutation operator does not lead to a good improvement with respect to the other mutation operators but does show promise for future work.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available