Use this URL to cite or link to this record in EThOS:
Title: Algorithmic problems in strings with applications to the analysis of biological sequences
Author: Barton, Carl Samuel
ISNI:       0000 0004 5372 0861
Awarding Body: King's College London
Current Institution: King's College London (University of London)
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
Recent advances in molecular biology have dramatically changed the way biological data analysis is performed [119, 92]. Next-Generation Sequenc- ing (NGS) technologies produce high-throughput data of highly controlled quality, hundreds of times faster and cheaper than a decade ago. Mapping of short reads to a reference sequence is a fundamental problem in NGS technologies. After finding an occurrence of a high quality fragment of the read, the rest must be approximately aligned, but a good alignment would not be expected to contain a large number of gaps (consecutive insertions or deletions). We present an alternative alignment algorithm which computes the optimal alignment with a bounded number of gaps. Another problem arising from NGS technologies is merging overlapping reads into a single string. We present a data structure which allows for the efficient computation of the overlaps between two strings as well as being applicable to other problems. Weighted strings are a representation of data that allows for a subtle representation of ambiguity in strings. In this document we present algorithms for other problems related to weighted strings: the computation of exact and approximate inverted repeats in weighted strings, computing repetitions and computing covers. We investigate the average-case complexity of wildcard matching. Wildcards can be used to model single nucleotide polymorphisms and so, efficient algorithms to search for strings with wildcards are necessary. In this document we investigate how efficient algorithms for this problem can be on average. There exist many organisms such as viruses, bacteria, eukaryotic cells, and archaea which have a circular DNA structure. If a biologist wishes to find occurrences of a particular virus in a carriers DNA sequence which may not be circular it must be possible to efficiently locate occurrences of circular strings. In this document we present a number of algorithms for circular string matching.
Supervisor: Iliopoulos, Costas ; Crochemore, Maxime Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available