Use this URL to cite or link to this record in EThOS:
Title: Feature selection and causal discovery for ensemble classifiers
Author: Duangsoithong, Rakkrit
ISNI:       0000 0004 2747 0272
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Access from Institution:
With rapid development of computer and information technology that can improve a large number of applications such as web text mining, intrusion detection, biomedical informatics, gene selection in micro array data, medical data mining, and clinical decision support systems, many information databases have been created. However, in some applications especially in the medical area, clinical data may contain hundreds to thousands of features with relatively few samples. A consequence of this problem is increased complexity that leads to degradation in efficiency and accuracy. Moreover, in this high dimensional feature space, many features are possibly irrelevant or redundant and should be removed in order to ensure good generalisation performance. Otherwise, the classifier may over-fit the data, that is the classifier may specialise on features which are not relevant for discrimination. To overcome this problem, feature selection and ensemble classification are applied. In this thesis, an empirical analysis on using bootstrap and random subspace feature selection for multiple classifier system is investigated and bootstrap feature selection and embedded feature ranking for ensemble MLP classifiers along with a stopping criterion based on the out-of-bootstrap estimate are proposed. Moreover, basically, feature selection does not usually take causal discovery into account. However, in some cases such as when the testing distribution is shifted from manipulation by external agent, causal discovery can provide some benefits for feature selection under these uncertainty conditions. It also can learn the underlying data structure, provide better understanding of the data generation process and better accuracy and robustness under uncertainty. Similarly, feature selection mutually enables global causal discovery algorithms to deal with high dimensional data by eliminating irrelevant and redundant features before exploring the causal relationship between features. A redundancy-based ensemble causal feature selection approach using bootstrap and random subspace and a comparison between correlation-based and causal feature selection for ensemble classifiers are analysed. Finally, hybrid correlation-causal feature selection for multiple classifier system is proposed in order to scale up causal discovery and deal with high dimensional features.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available