Use this URL to cite or link to this record in EThOS:
Title: Development of new knowledge discovery tools to explore biomedical datasets in breast cancer
Author: Hill, Nathan Stuart
Awarding Body: Cardiff University
Current Institution: Cardiff University
Date of Award: 2009
Availability of Full Text:
Access from EThOS:
Access from Institution:
The explorative power of high throughput technologies in cancer research has become well established in recent years, exemplified by diverse gene microarray studies. However, development of the necessary biomedical data analysis tools has historically been confined to a commercial environment, while comprehensive, user-friendly analysis approaches are still needed. Availability of freely-available software, notably the 'R' project statistical programming language, allowed development of a user-friendly multivariate statistics application - Informatics Tenovus (I-10) - in this project. I-10 provides a platform through which powerful existing and future 'R' project statistical analysis methodologies can be applied, without prior programming knowledge. The new system was tested in the context of exploring antihormone resistance in breast cancer, analysing microarray datasets from in vitro models of acquired Tamoxifen (TAMR) or Faslodex resistance (FASR) versus endocrine responsive MCF-7 cells. The analysis not only revealed known de-regulated genes, but also further potential future markers/targets for endocrine response/resistance. The advantages of the 'R' programming environment together with Microsoft Visual technology for producing user-friendly biomedical analysis tools facilitated subsequent development of a tool which could explore SEER cancer patient datasets. This new cancer query survival tool - Superstes -allows detailed statistical modelling of the impact that multiple patient attributes (in this instance derived from the SEER breast and colorectal cancer datasets) have on patient survival. The versatility of 'R' was additionally demonstrated in further exploring classifiers, where it was able to interface with the sophisticated, freely available machine learning application 'Weka'. Using 'R' and Weka, breast cancer patient survival was modelled using equivalent patient attributes to the Nottingham Prognostic Index and a 10 year survival subset of the SEER breast cancer dataset. Several machine learning methodologies were compared for their ability to accurately model survival, with their value in routine clinical use for prediction of patient survival then critically evaluated.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available