Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.489046
Title: Quantifying the notion of 'clumpiness' within alignments obtained from BLAST similarity searches
Author: Birrell, Jacky
ISNI:       0000 0001 3465 2531
Awarding Body: University of Abertay Dundee
Current Institution: Abertay University
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
There are numerous methods utilised in the determination of the function of newly sequenced DNA or proteins. One such method is the use of sequence similarity searches, such as BLAST. However, due to the speed at which sequences can be produced and the ever-increasing size of the databases against which they are searched, it is becoming progressively more difficult for the scientist to carry out the necessary data analysis manually. Therefore, an automation of the analysis of the BLAST results should greatly reduce the amount of labour for the scientist and so improve the chances of accelerating research progress or indicate new fields of investigation. An in-depth study of how the BLAST algorithm works was conducted. Also, interviews were used to determine which of the BLAST result features are of importance to the scientist in the decision of whether a particular similarity hit was of importance to their field of research and function determination. Based on this study, the feature of the clumpiness of a match’s alignment was chosen as the focus of this research. This decided, techniques into quantifying this clumpiness were studied and several possible clumpiness measures were proposed. These measures were then tested with regard to specified criteria in order to assess their suitability as a clumpiness measure. This analysis was first conducted on synthetic data and it was found that the CUSUM measure proved to be the best according to the criteria and was chosen as the clumpiness measure for the subsequent testing. This took the form of testing the measure within real BLAST sequence analysis via the use of a prototype, which was utilised by scientists in their research. In conjugation with this, benchmark datasets containing families with distant relatives were used in order to assess the clumpiness measure’s ability to identify these distant relatives. Additional testing of the dumpiness measure was performed on a more abstract dataset of events and non-events in a one-dimensional field. For both the prototype and the abstract testing, the results showed that the CUSUM clumpiness measure gives a good approximation of the degree of clustering of events within a one-dimensional field. In addition there is an indication that the measure will be of use in the identification of distant relatives, however, further testing is required to widen the subject base and further validate the measures suitability for assisting in the function determination of novel sequences.
Supervisor: Natanson, Louis Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.489046  DOI: Not available
Share: