Rough set clustering using local and global data knowledge
This thesis presents the design and implementation of a knowledge-oriented clustering algorithm that can be applied to data of both single and mixed-attribute type. The algorithm has a simple framework, based on that of hierarchical clustering, and the main clustering tool is a form of indiscernibility relation modified from the field of rough set theory. The research focuses on extracting maximal knowledge from data, both local and global, with minimal human intervention in order to obtain clusters that are meaningful and free from user-bias. This is achieved by employing well-defined numerical procedures to set key threshold parameters and by making use of a cluster accuracy measure to yield representative clusters within the boundaries of the given application. The algorithm is unified in its approach to clustering, which ensures consistency in the results when used to cluster the same data by different users, and knowledge can be represented tangibly throughout the clustering process as a series of classification rules sets, thus enhancing interpretability of the results. The research in this thesis makes specific contribution to the area of knowledge-oriented clustering which stem from the design and implementation of the proposed algorithm. Numerical techniques control the setting of initial threshold parameters in order to obtain an initial clustering of a given data set and a defined accuracy measure quantifies the notion of cluster 'meaningfulness'. Throughout the clustering process, clusters are automatically modified using a 'gamma threshold selection rule' and quick supervised clustering of a data set can be achieved using the classification rules obtained from the clustering of a similar data set. This tangible clustering knowledge represented by the rules can further be modified to provide a strategy for automatic decision-making.