Use this URL to cite or link to this record in EThOS:
Title: Learning and approximation algorithms for problems motivated by evolutionary trees
Author: Cryan, Mary Elizabeth
ISNI:       0000 0001 3399 6586
Awarding Body: University of Warwick
Current Institution: University of Warwick
Date of Award: 1999
Availability of Full Text:
Access from EThOS:
Access from Institution:
In this thesis we consider some computational problems motivated by the biological problem of reconstructing evolutionary trees. In this thesis, we are concerned with the design and analysis of efficient algorithms for clearly defined combinatorial problems motived by this application area. We present results for two different kinds of problem. Our first problem is motivated by models of evolution that describe the evolution of biological species in terms of a stochastic process that alters the DNA of species. The particular stochastic model that we considered is called the Two-State General Markov Model. In this model, an evolutionary tree can be associated with a distribution on the different "patterns" that may appear among the sequences for all the species in the evolutionary tree. Then the data for a collection of species whose evolutionary tree is unknown can be viewed as samples from this (unknown) distribution. An interesting problem asks whether we can use samples from an unknown evolutionary tree M to find another tree M*for those species, so that the distribution of M* is similar to that of M. This is essentially a PAC-learning problem ("Probably Approximately Correct") in the sense of Valiant and Kearns et al. Our results show that evolutionary trees in the Two-State General Markov can be efficiently PAC-learned in the variation distance metric using a "reasonable" number of samples. The two other problems that we consider are combinatorial problems that are also motivated by evolutionary tree construction. The input to each of these problems consists of a fixed tree topology whose leaves are bijectively labelled by the elements of a species set, as well as data for those species. Both problems involve labelling the internal nodes in the fixed topology in order to minimize some function on that tree (both functions that we consider are assumed to test the quality of the tree topology in some way). The two problems that we consider are known to be NP-hard. Our contribution is to present efficient approximation algorithms for both problems.
Supervisor: Not available Sponsor: Department of Computer Science, University of Warwick
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA76 Electronic computers. Computer science. Computer software