Activity fingerprints in DNA based on a structural analysis of sequence information
The function of a DNA sequence is commonly predicted by measuring its nucleotide similarity to known functional sets. However, the use of structural properties to identify patterns within families is justified by the discovery that many very different sequences have similar structural properties. The aim of this thesis is to develop tools that detect any unusual structural characteristics of a particular sequence or that identify DNA structure-activity fingerprints common to a set. This work uses the Octamer Database to describe DNA. The database's contents are split into two categories: those parameters that describe minimum energy structure and those that measure flexibility. Information from both of these categories has been combined to describe structural tendencies, offering an alternative measure of sequence similarity. A structural DNA profile gives a graphical illustration of how a parameter from the Octamer Database varies across either a single sequence's length or across a set of sequences. Profile Manager is an application that has been developed to automate single sequence profile generation and is used to study the A-tract phenomenon. The use of profiles to explore patterns in flexibility across a set of pre-aligned promoters is then investigated with interesting transitions in decreasing twist flexibility discovered. Multiple sequence queries are harder to solve than those of single sequences, due to the inherent need for the sequences to be aligned. It is only under rare circumstances that sequences are pre-aligned by an experimentally determined position. More commonly a multiple alignment must be generated. An extended, structure-based, hidden Markov model technique that successfully generates structural alignment~ is presented. Its. application is tested on four DNA protein binding site datasets with comparisons made to the traditional sequence method. Structural alignments of two out of the four datasets were comparable in performance to sequence with useful insights into underlying structural mechanisms.