Title:

Efficiency of algorithms in phylogenetics

Phylogenetics is the study of evolutionary relationships between species. Phylogenetic trees have long been the standard object used in evolutionary biology to illustrate how a given set of species are related. There are some groups (including certain plant and fish species) for which the ancestral history contains reticulation events, caused by processes that include hybridization, lateral gene transfer, and recombination. For such groups of species, it is appropriate to represent their ancestral history by phylogenetic networks: rooted acyclic digraphs, where arcs represent lines of genetic inheritance and vertices of indegree at least two represent reticulation events. This thesis is concerned with the efficiency, accuracy, and tractability of mathematical models for phylogenetic network methods. Three important and related measures for summarizing the dissimilarity in phylogenetic trees are the minimum number of hybridization events required to fit two phylogenetic trees onto a single phylogenetic network (the hybridization number), the (rooted) subtree prune and regraft distance (the rSPR distance) and the tree bisection and reconnection distance (the TBR distance) between two phylogenetic trees. The respective problems of computing these measures are known to be NPhard, but also fixedparameter tractable in their respective natural parameters. This means that, while they are hard to compute in general, for cases in which a parameter (here the hybridization number and rSPR/TBR distance, respectively) is small, the problem can be solved efficiently even for large input trees. Here, we present new analyses showing that the use of the “cluster reduction” rule – already defined for the hybridization number and the rSPR distance and introduced here for the TBR distance – can transform any O(f(p) · n)time algorithm for any of these problems into an O(f(k) · n)time one, where n is the number of leaves of the phylogenetic trees, p is the natural parameter and k is a much stronger (that is, smaller) parameter: the minimum level of a phylogenetic network displaying both trees. These results appear in [9]. Traditional “distance based methods” reconstruct a phylogenetic tree from a matrix of pairwise distances between taxa. A phylogenetic network is a generalization of a phylogenetic tree that can describe evolutionary events such as reticulation and hybridization that are not treelike. Although evolution has been known to be more accurately modelled by a network than a tree for some time, only recently have efforts been made to directly reconstruct a phylogenetic network from sequence data, as opposed to reconstructing several trees first and then trying to combine them into a single coherent network. In this work, we present a generalisation of the UPGMA algorithm for ultrametric tree reconstruction which can accurately reconstruct ultrametric treechild networks from the set of distinct distances between each pair of taxa. This result will also appear in [15]. Moreover, we analyse the safety radius of the NETWORKUPGMA algorithm and show that it has safety radius 1/2. This means that if we can obtain accurate estimates of the set of distances between each pair of taxa in an ultrametric treechild network, then NETWORKUPGMA correctly reconstructs the true network.
