Use this URL to cite or link to this record in EThOS:
Title: Statistical models for cancer gene expression data and visualization of biological networks
Author: Tripathi , Shailesh
Awarding Body: Queen's University Belfast
Current Institution: Queen's University Belfast
Date of Award: 2013
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Gene expression data form a rich source of information for elucidating the biological function of cellular systems on the pathway level. For this reason, various pathwaybased methods have been developed for analyzing gene expression data from highthroughput experiments. However, in order to utilize the full potential the data can offer, e.g., for cancer research, a more thorough understanding of such methods is required. This thesis consists of two major parts, which contain the results. In the first results part of the thesis we investigate the statistical characteristics of five competitive gene set methods. One major finding shows that three of these five methods, namely, GSEA, GSEArot and GAGE, are negatively influenced by the number of background genes, and, hence, the filtering of the data, in the sense that these methods become more sensitive for expression changes despite the fact that the number of samples remains constant This counter intuitive behavior leads to principle problems for the application of these methods to biological data making the results from these methods no longer reconcilable with the principles of statistical inference rendering the obtained results in the worst case inexpressive. In order to avoid these problems, we suggest an experimental design that helps preventing such issues. Further, we present a new assessment method that allows a power analysis of competitive but also self-contained gene set methods. More precisely, due to the general lack of a sufficient sample size in real data sets, simulated expression data are required in order to investigate statistical methods thoroughly. However, the simulation of pathway-based methods is challenging due to the presence of nontrivial correlation structures within pathways the simulations need to account for. For this reason, we investigated new simulation methods in order to identify commonalities and differences with respect to their biological characteristics. In the second results part we present an R software package we developed, called NETBlOV. NETBlOV enables the visualization of large biological networks and to highlight structural features that are of biological relevance, e.g., the modularity of these networks.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available