Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.762412
Title: Avoided words, overabundant words, maximal palindromes and applicatons
Author: Gao, Jia
ISNI:       0000 0004 7656 6465
Awarding Body: King's College London
Current Institution: King's College London (University of London)
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
The study of string algorithms is essential for computer science and computational molecular biology. Recently, a number of algorithms on strings appear in many fields, such as pattern matching, combinatorics on words, string processing and automata the-ory, because of their magnitude advances in applications and theories. These advances have made to develop faster algorithms and to deal with certain natural problems. In this thesis, we focus on computing certain structures in biological sequences using different algorithmic methods. Firstly, we study the computation of avoided words and overabundant words in biological sequences. The observed frequency of the longest proper prefix, the longest proper suffix, and the longest infix of a word in a given sequence can be used for classifying that word as avoided or overabundant. This concept is particularly useful in DNA linguistic analysis. The definitions used for the expectation and deviation of the word in this statistical model were described and biologically justified by Brendel et al. [19]. We present some algorithms that can be used effectively for computing such words. Furthermore, experimental results, using both real and synthetic data, which further highlight the effectiveness of this model, show the efficiency and applicability of our implementation in biological sequence analysis. Secondly, we consider a special type of uncertain sequence called weighted string. In a weighted string every position contains a subset of the alphabet and every letter of the alphabet is associated with a probability of occurrence such that the sum of probabilities at each position equals 1. We generalize Alatabbi et al.’s [2] solution for standard strings to compute maximal palindromes of a weighted string. We provide some efficient algorithms to compute maximal palindromes in weighted strings. Last but not least, we make available an implementation of our algorithms, using synthetic data, show the efficiency of our implementation.
Supervisor: Iliopoulos, Costas ; Crochemore, Maxime Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.762412  DOI: Not available
Share: