Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713584
Title: Prediction of mammalian essential genes based on sequence and functional features
Author: Kabir, Mitra
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2017
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
Essential genes are those whose presence is imperative for an organism's survival, whereas the functions of non-essential genes may be useful but not critical. Abnormal functionality of essential genes may lead to defects or death at an early stage of life. Knowledge of essential genes is therefore key to understanding development, maintenance of major cellular processes and tissue-specific functions that are crucial for life. Existing experimental techniques for identifying essential genes are accurate, but most of them are time consuming and expensive. Predicting essential genes using computational methods, therefore, would be of great value as they circumvent experimental constraints. Our research is based on the hypothesis that mammalian essential (lethal) and non-essential (viable) genes are distinguishable by various properties. We examined a wide range of features of Mus musculus genes, including sequence, protein-protein interactions, gene expression and function, and found 75 features that were statistically discriminative between lethal and viable genes. These features were used as inputs to create a novel machine learning classifier, allowing the prediction of a mouse gene as lethal or viable with the cross-validation and blind test accuracies of ∼91% and ∼93%, respectively. The prediction results are promising, indicating that our classifier is an effective mammalian essential gene prediction method. We further developed the mouse gene essentiality study by analysing the association between essentiality and gene duplication. Mouse genes were labelled as singletons or duplicates, and their expression patterns over 13 developmental stages were examined. We found that lethal genes originating from duplicates are considerably lower in proportion than singletons. At all developmental stages a significantly higher proportion of singletons and lethal genes are expressed than duplicates and viable genes. Lethal genes were also found to be more ancient than viable genes. In addition, we observed that duplicate pairs with similar patterns of developmental co-expression are more likely to be viable; lethal gene duplicate pairs do not have such a trend. Overall, these results suggest that duplicate genes in mouse are less likely to be essential than singletons. Finally, we investigated the evolutionary age of mouse genes across development to see if the morphological hourglass pattern exists in the mouse. We found that in mouse embryos, genes expressed in early and late stages are evolutionarily younger than those expressed in mid-embryogenesis, thus yielding an hourglass pattern. However, the oldest genes are not expressed at the phylotypic stage stated in prior studies, but instead at an earlier time point - the egg cylinder stage. These results question the application of the hourglass model to mouse development.
Supervisor: Hentges, Kathryn Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.713584  DOI: Not available
Keywords: Morphological Hourglass Model ; Gene Duplication ; Gene Prediction ; Feature Selection ; Machine Learning ; Mammalian Gene Essentiality ; Random Forest ; Gene co-expression
Share: