Use this URL to cite or link to this record in EThOS:
Title: New probabilistic graphical models and meta-learning approaches for hierarchical classification, with applications in bioinformatics and ageing
Author: Fabris, Fabio
ISNI:       0000 0004 6424 0417
Awarding Body: University of Kent
Current Institution: University of Kent
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Access from Institution:
This interdisciplinary work proposes new hierarchical classification algorithms and evaluates them on biological datasets, and specifically on ageing-related datasets. Hierarchical classification is a type of classification task where the classes to be predicted are organized into a hierarchical structure. The focus on ageing is justified by the increasing impact that ageing-related diseases have on the human population and by the increasing amount of freely available ageing-related data. The main contributions of this thesis are as follows. First, we improve the running time of a previously proposed hierarchical classification algorithm based on an extension of the well-known Naive Bayes classification algorithm. We show that our modification greatly improves the runtime of the hierarchical classification algorithm, maintaining its predictive performance. We also propose four new hierarchical classification algorithms. The focus on hierarchical classification algorithms and their evaluation on biological data is justified as the class labels of biological data are commonly organized into class hierarchies. Two of our four new hierarchical classification algorithms - the "Hierarchical Dependence Network" (HDN) and the "Hierarchical Dependence Network algorithm based on finding non-Hierarchically related Predictive Classes'' (HDN-nHPC) - are based on Dependence Networks, a relatively new type of probabilistic graphical model that has not yet received a lot of attention from the classification community. The other two hierarchical classification algorithms we proposed are hybrid algorithms that use the hierarchical classification models produced by the Predictive Clustering Tree (PCT) algorithm. One of the hybrids combines the models produced by the PCT algorithm and a Local Hierarchical Classification (LHC) algorithm (which basically induces a local model for each class in the hierarchy). The other hybrid combines the models produced by the PCT and HDN algorithms. We have tested our four proposed algorithms and four other commonly used hierarchical classification algorithms on 42 hierarchical classification datasets. 20 of these datasets were created by us and are freely available for researchers. We have concluded that, for one out of the three hierarchical predictive accuracy measures used in our experiments, one of our four new algorithms (the HDN-nHPC algorithm) outperforms all other seven algorithms in terms of average rank across the 42 hierarchical classification datasets. We have also proposed the first meta-learning approach for hierarchical classification problems. In meta-learning, each meta-instance represents a dataset, meta-features represent dataset properties, and meta-classes represent the best classification algorithm for the corresponding dataset (meta-instance). Hence, meta-learning techniques for classification use the predictive performance of some candidate classification algorithms in previously tested datasets, and dataset descriptors (the meta-features), to infer the performance of those candidate classification algorithms in new datasets, given the meta-features of those new datasets. The predictions of our meta-learning system can be used as a guide to choose which hierarchical classification algorithm (out of a set of candidate ones) to use on a new dataset, without the need for time-consuming trial and error experiments with those candidate algorithms. This is particularly important for hierarchical classification problems, as the training time of hierarchical classification algorithms tends to be much greater than the training time of 'flat' classification algorithms. This increased training time is mainly due to the typically much greater number of class labels that annotate the instances of hierarchical classification problems. We have tested the predictive power of our meta-learning system and interpreted some generated meta-models. We have concluded that our meta-learning system had good predictive performance when compared to other baseline meta-learning approaches. We have also concluded that the meta-rules generated by our meta-learning system were useful to identify dataset characteristics to assist the choice of hierarchical classification algorithm. Finally, we have reviewed the current practice of applying supervised machine learning (classification and regression) algorithms to study the biology of ageing. This review discusses the main findings of such algorithms, in the context of the ageing biology literature. We have also interpreted some of the hierarchical classification models generated in our experiments. Both the above literature review and the interpretation of some models were performed in collaboration with an ageing expert, in order to extract relevant information for ageing research.
Supervisor: A. Freitas, Alex Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available