Use this URL to cite or link to this record in EThOS:
Title: A new computational framework for the classification and function prediction of long non-coding RNAs
Author: Deshpande, Sumukh
ISNI:       0000 0004 7962 2864
Awarding Body: Coventry University
Current Institution: Coventry University
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Long non-coding RNAs (lncRNAs) are known to play a significant role in several biological processes. These RNAs possess sequence length greater than 200 base pairs (bp), and so are often misclassified as protein-coding genes. Most Coding Potential Computation (CPC) tools fail to accurately identify, classify and predict the biological functions of lncRNAs in plant genomes, due to previous research being limited to mammalian genomes. In this thesis, an investigation and extraction of various sequence and codon-bias features for identification of lncRNA sequences has been carried out, to develop a new CPC Framework. For identification of essential features, the framework implements regularisation-based selection. A novel classification algorithm is implemented, which removes the dependency on experimental datasets and provides a coordinate-based solution for sub-classification of lncRNAs. For imputing the lncRNA functions, lncRNA-protein interactions have been first determined through co-expression of genes which were re-analysed by a sequence similaritybased approach for identification of novel interactions and prediction of lncRNA functions in the genome. This integrates a D3-based application for visualisation of lncRNA sequences and their associated functions in the genome. Standard evaluation metrics such as accuracy, sensitivity, and specificity have been used for benchmarking the performance of the framework against leading CPC tools. Case study analyses were conducted with plant RNA-seq datasets for evaluating the effectiveness of the framework using a cross-validation approach. The tests show the framework can provide significant improvements on existing CPC models for plant genomes: 20-40% greater accuracy. Function prediction analysis demonstrates results are consistent with the experimentally-published findings.
Supervisor: Shuttleworth, James ; Yang, Jianhua ; England, Matthew Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available