Use this URL to cite or link to this record in EThOS:
Title: Gene expression data annotation, effective storage, and enrichment through data mining
Author: Sideris, E.
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis describes the development of different bioinformatics resources and data-mining strategies for managing and analysing the large amounts of data produced by microarray gene expression experiments. Initially, this involved addressing the problem of effectively capturing gene expression microarray data and the accompanying meta-data annotations de scribing the experimental process. This is necessary for reasons of archiving, interchange and reproducibility of datasets and comparability between them. This was achieved by the development of meditor, a graphical computer programme which allows the description of microarray experimental information through the use of diagrams and ontology-driven forms, meditor adheres to the standards set by the Microarray Gene Expression Data Society (MGED), and therefore is able to capture all the experimental information describable within the standard in a platform-independent manner. Subsequently, in order to provide capabilities for the formal modelling of gene expression analysis concepts, the concepts involved in the external validation of gene expression clusterings were formalised and defined as an object model. This model was developed with the implementation of data interchange file formats in mind. This work complements the object model of the MGED Society and attempts to cover an area that has not been formalised in a platform-independent manner by the standard object model. Finally, a method was developed to allow the use of knowledge on protein functions and protein-protein interactions to identify coherent sets of co-regulated genes suggested by the clustering of gene expression profiles. This was achieved through the development of a gene expression clustering quality metric, which judges the tightness and separation of gene expression clusters, thus providing a quality measure on a clustering or a per-cluster basis. Cluster tightness and separation are assessed by harnessing the manual annotations provided by the Gene Ontology, enriched using integrated biological information available through an in-house data warehouse (BioMap). The metric was tested on a human B-cell gene expression dataset and refined on the basis of the results produced.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available