Use this URL to cite or link to this record in EThOS:
Title: DifFUZZY : a novel clustering algorithm for systems biology
Author: Cominetti Allende, Ornella Cecilia
ISNI:       0000 0004 2722 2594
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Thesis embargoed until 20 Apr 2025
Access from Institution:
Current studies of the highly complex pathobiology and molecular signatures of human disease require the analysis of large sets of high-throughput data, from clinical to genetic expression experiments, containing a wide range of information types. A number of computational techniques are used to analyse such high-dimensional bioinformatics data. In this thesis we focus on the development of a novel soft clustering technique, DifFUZZY, a fuzzy clustering algorithm applicable to a larger class of problems than other soft clustering approaches. This method is better at handling datasets that contain clusters that are curved, elongated or are of different dispersion. We show how DifFUZZY outperforms a number of frequently used clustering algorithms using a number of examples of synthetic and real datasets. Furthermore, a quality measure based on the diffusion distance developed for DifFUZZY is presented, which is employed to automate the choice of its main parameter. We later apply DifFUZZY and other techniques to data from a clinical study of children from The Gambia with different types of severe malaria. The first step was to identify the most informative features in the dataset which allowed us to separate the different groups of patients. This led to us reproducing the World Health Organisation classification for severe malaria syndromes and obtaining a reduced dataset for further analysis. In order to validate these features as relevant for malaria across the continent and not only in The Gambia, we used a larger dataset for children from different sites in Sub-Saharan Africa. With the use of a novel network visualisation algorithm, we identified pathobiological clusters from which we made and subsequently verified clinical hypotheses. We finish by presenting conclusions and future directions, including image segmentation and clustering time-series data. We also suggest how we could bridge data modelling with bioinformatics by embedding microarray data into cell models. Towards this end we take as a case study a multiscale model of the intestinal crypt using a cell-vertex model.
Supervisor: Maini, Philip K. ; Erban, Radek ; Byrne, Helen ; Murray, Philip Sponsor: Clarendon Fund
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Bioinformatics (life sciences) ; Biology and other natural sciences (mathematics) ; Mathematical biology ; Malaria ; Pattern recognition (statistics) ; Pattern recognition (statistics) ; Bioinformatics (technology) ; Applications and algorithms ; Mathematical modeling (engineering) ; clustering algorithm ; fuzzy clustering ; diffusion distance ; genetic expression data clustering ; malaria