Use this URL to cite or link to this record in EThOS:
Title: Translating nucleic acid binding protein function from model species to minor crops using transfer learning
Author: Bonthala, Venkata Suresh
ISNI:       0000 0004 7430 359X
Awarding Body: University of Nottingham
Current Institution: University of Nottingham
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Genomic elements such as proteins or genes are the basic unit of the genome and involved in the functioning of every biological process. Predicting, therefore, the function of these genomic elements is the first step in the understanding of functioning of plants under various stress conditions. To date, various types of computational methods have been developed to predict the function of a given protein sequence. The recent increase in the development of a number of methods has created its own set of problems leading to difficulty in applying on newly sequenced genomes especially non-model crops. Due to these reasons, the immediate requirement for development of sophisticated computational methods to predict the function of a given protein sequence is raised. This thesis presents three novel computational tools developed based on transfer learning algorithms to predict the function of a given protein sequence and these tools are: 1) TL-RBPPred, for prediction of RNA-binding proteins, outperformed SPOT-Seq, RNApred, RBPPred and BLASTp on HumanSet (AUC of 0.977), YeastSet (AUC of 0.971), ArabidopsisSet (AUC of 0.972) and GlymaxSet (AUC of 0.97); 2) TL-DBPPred, for prediction of DNA-binding proteins, outperformed DNABP, enDNA-Prot, iDNA-Prot, nDNAProt, iDNA-Prot|Dis, DNAbinder and BLASTp on an testing dataset (AUC of 0.988); and 3) TL-TFPred, for prediction of transcription factors, outperformed PlantTFcat, iTAK and BLASTp on testing dataset (AUC of 0.999) in terms of prediction accuracy. Further, both TL-RBPPred and TL-DBPPred were tested on the transcriptome of the non-model crop, Bambara groundnut (Vigna subterranea (L.) Verdc.), to identify RNA-binding and DNA-binding proteins, respectively. The results obtained from these tests indicated that these two methods outperformed in terms of prediction accuracy (AUC) as compared to existing current state-of-the art tools such as SPOT-Seq, RBPPred, iDNA-Prot and iDNA-Prot|Dis. Based on the performance, the developed methods will be useful in predicting the function of given protein sequences (DNA, RNA-binding and transcription factor) of model species as well as non-model crops.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QH Natural history. Biology