Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.668978
Title: An informational-dynamical approach to characterise and model the complexity of the DNA
Author: Srivastava, Shambhavi
ISNI:       0000 0004 5368 1364
Awarding Body: University of Aberdeen
Current Institution: University of Aberdeen
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
In this thesis, we show how to create an approximate Markov model for the DNA. This model is constructed by encoding the DNA nucleotides into finite length symbolic sequences, referred to as words, and creating a 2D symbolic space for the DNA, where points plotted in that space represent words. From the construction of our model, we are able to specify words for the DNA, their lengths and how they can be organised together in groups of symbolic similarities. The model also allows the construction of a network of the DNA, where the nodes represent group of words and the edges connecting two nodes a measure of the likelihood that words in a group are mapped to another strongly correlated group of words after 1 shift in the nucleotide sequence. The model is then applied to reduce the complexity of the DNA, by considering the most relevant group of words that carry most of the information of the DNA. We were able to show that in the E. coli's 2/5th of the information is lost by neglecting only 3 groups of words. The model was then applied to construct measures of similarity between genes and predictability of genes in different organisms. We then study the long-term behaviour of group of words in our Markov model by analysing their recurrence properties. For some group of words, the statistics of returns was theoretically estimated from statistical properties of our model. The groups of words that contribute more to the DNA's random nature provide a simple way to analytically estimate the statistics of returns of words belonging to these groups. As an application of the recurrence analysis, we were able to show that the coding regions of the DNA contribute more to its random character.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.668978  DOI: Not available
Keywords: DNA
Share: