Use this URL to cite or link to this record in EThOS:
Title: Cluster-based semi-supervised ensemble learning
Author: Soares, Rodrigo Gabriel Ferreira
ISNI:       0000 0004 5346 7552
Awarding Body: University of Birmingham
Current Institution: University of Birmingham
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
Semi-supervised classification consists of acquiring knowledge from both labelled and unlabelled data to classify test instances. The cluster assumption represents one of the potential relationships between true classes and data distribution that semi-supervised algorithms assume in order to use unlabelled data. Ensemble algorithms have been widely and successfully employed in both supervised and semi-supervised contexts. In this Thesis, we focus on the cluster assumption to study ensemble learning based on a new cluster regularisation technique for multi-class semi-supervised classification. Firstly, we introduce a multi-class cluster-based classifier, the Cluster-based Regularisation (Cluster- Reg) algorithm. ClusterReg employs a new regularisation mechanism based on posterior probabilities generated by a clustering algorithm in order to avoid generating decision boundaries that traverses high-density regions. Such a method possesses robustness to overlapping classes and to scarce labelled instances on uncertain and low-density regions, when data follows the cluster assumption. Secondly, we propose a robust multi-class boosting technique, Cluster-based Boosting (CBoost), which implements the proposed cluster regularisation for ensemble learning and uses ClusterReg as base learner. CBoost is able to overcome possible incorrect pseudo-labels and produces better generalisation than existing classifiers. And, finally, since there are often datasets with a large number of unlabelled instances, we propose the Efficient Cluster-based Boosting (ECB) for large multi-class datasets. ECB extends CBoost and has lower time and memory complexities than state-of-the-art algorithms. Such a method employs a sampling procedure to reduce the training set of base learners, an efficient clustering algorithm, and an approximation technique for nearest neighbours to avoid the computation of pairwise distance matrix. Hence, ECB enables semi-supervised classification for large-scale datasets.
Supervisor: Not available Sponsor: CAPES Foundation ; iSense
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA75 Electronic computers. Computer science