Algorithm engineering : string processing
The string matching problem has attracted a lot of interest throughout the history of computer science, and is crucial to the computing industry. The theoretical community in Computer Science has a developed a rich literature in the design and analysis of string matching algorithms. To date, most of this work has been based on the asymptotic analysis of the algorithms. This analysis rarely tell us how the algorithm will perform in practice and considerable experimentation and fine-tuning is typically required to get the most out of a theoretical idea. In this thesis, promising string matching algorithms discovered by the theoretical community are implemented, tested and refined to the point where they can be usefully applied in practice. In the course of this work we have presented the following new algorithms. We prove that the time complexity of the new algorithms, for the average case is linear. We also compared the new algorithms with the existing algorithms by experimentation. " We implemented the existing one dimensional string matching algorithms for English texts. From the findings of the experimental results we identified the best two algorithms. We combined these two algorithms and introduce a new algorithm. " We developed a new two dimensional string matching algorithm. This algorithm uses the structure of the pattern to reduce the number of comparisons required to search for the pattern. " We described a method for efficiently storing text. Although this reduces the size of the storage space, it is not a compression method as in the literature. Our aim is to improve both space and time taken by a string matching algorithm. Our new algorithm searches for patterns in the efficiently stored text without decompressing the text. " We illustrated that by pre-processing the text we can improve the speed of the string matching algorithm when we search for a large number of patterns in a given text. " We proposed a hardware solution for searching in an efficiently stored DNA text.