Use this URL to cite or link to this record in EThOS:
Title: A general framework for building accurate and understandable genomic models : a study in rice (Oryza sativa)
Author: Orhobor, Oghenejokpeme
ISNI:       0000 0004 7658 4794
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Rapid technological advances in genotyping and sequencing technologies are driving the generation of vast amounts of genomic data. These advancements present a unique opportunity to improve our understanding of the environmental and genetic mechanisms that give rise to phenotypes. This data is technically hard to analyse because there are many attributes (often in the order of a million), and vast quantities of background knowledge is relevant. Genotype data are most commonly used in genomic models to identify genetic regions which control phenotypes and to predict the likelihood that members of a population will produce progeny with particular phenotypes. However, most of the data may be irrelevant for certain phenotypes, leading to suboptimal, difficult to understand models. To meet this challenge, we propose a three-stage general framework that incorporates background knowledge in its model building processes by applying feature stability, inductive logic programming (ILP), and meta-learning. In the first stage of the framework, we identify associated markers using marker stability rather than traditional mixed models. In the second stage we formalise the identified frequent patterns and additional background knowledge as predicates in first order logic, and using an ILP engine we identify frequent patterns which correspond to genetic configurations that are associated with a trait. Finally, the identified frequent patterns in the previous stage are used as additional data for phenotype prediction. We demonstrate that this framework (1) significantly outperforms the state-of-the-art in identifying associated genomic regions, (2) identifies relevant genetic configurations, and (3) improves overall phenotype prediction, using a diverse Rice (Oryza sativa) population.
Supervisor: Brown, Gavin ; King, Ross Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available