Use this URL to cite or link to this record in EThOS:
Title: A strategy for a systematic approach to biomarker discovery validation : a study on lung cancer microarray data set
Author: Dol, Zulkifli
ISNI:       0000 0004 5369 8764
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Cancer is a serious threat to human health and is now one of major causes of death worldwide. However, the complexity of the cancer makes the development of new and specific diagnostic tools particularly challenging. A number of different strategies have been developed for biomarker discovery in cancer using microarray data. The problem that typically needs to be addressed is the scale of the data sets; we simply do not have (or are likely to obtain) sufficient data for classical machine learning approaches for biomarker discovery to be properly validated. Obtaining a biomarker that is specific to a particular cancer is also very challenging. The initial promise that was held out for gene microarray work for the development of cancer biomarkers has not yet yielded the hoped for breakthroughs. This work discusses the construction of a strategy for a systematic approach to biomarker discovery validation using lung cancer gene expression microarray data based around non-small cell cancer and in patients which either stayed disease free after surgery (a five year window) or in which the disease progressed and re-occurred. As a means of assisting the validation purposes we have therefore looked at new methodologies for using existing biological knowledge to support machine learning biomarker discovery techniques. We employ text mining strategy using previously published literature for correlating biological concepts to a given phenotype. Pathway driven approaches through the use of Web Services and workflows, enabled the large-scale dataset to be analysed systematically. The results showed that it was possible, at least using this specific data set, to clearly differentiate between progressive disease and disease free patients using a set of biomarkers implicated in neuroendocrine signaling. A validation of the biomarkers identified was attempted in three separately published data sets. This analysis showed that although there was support for some of our findings in one of these data sets, this appeared to be a function of the close similarity in experimental design followed rather than through specific of the analysis method developed.
Supervisor: Brass, Andrew Sponsor: Ministry of Higher Education Malaysia ; Universiti Sains Malaysia
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Biomarker discovery ; Machine learning ; Text mining ; Lung cancer ; Gene expression ; Microarray data