Title:
|
Statistical models for RNA-seq data analysis of cancer
|
In our research we addressed several major points, related with RNA-seq-based models for Cancer.
The first chapter reviews various genomics technologies from the pre-NGS era and most commonSy used NGS platforms, as well as recently developed methods.
From here the main concepts of differential expression for SAGE technology and RNA-seq were considered, going on to discuss several the most widely used methods in the field.
In the third chapter we formulated the biological problem, that is, reproducibility and robustness of RNA-seq Differential Expression Analysis, and made some general observations on counts distributions of cancer-related RNA-seq data as well as sequencing depth alterations impact on data.
In the chapter five we employed this robustness approach to rank the performance of existing differential gene expression (DGE) models and studied effects of subsamping in terms of library, size and number of samples on the outcome of a DGE analysis.
In addition, in this chapter we introduced samExploreR - an R package that allows one to implement the sequencing depth altering simulations quickly and efficiently.
Building on this work we applied the concept of subsampling to Quadratic - a candidate compound discovery framework based on connectivity mapping and explored its robustness and reproducibility for various, datasets.
Finally, in chapter seven we explored how integrating information from different RNA-seq based approaches may affect the resulting outcome of the analysis and studied robustness' of those methods.
The approaches adapted in this body of work allowed us to introduce the procedure of subsampling as a quality control measure that can allow an inference of quality when applied to datasets in research and clinical procedures.
|