The likelihood of gene trees under selective models
The extent to which natural selection shapes diversity within populations is a key question for population genetics. Thus, there is considerable interest in quantifying the strength of selection. In this thesis a full likelihood approach for inference about selection at a single site within an otherwise neutral fully-linked sequence of sites is developed. Integral to many of the ideas introduced in this thesis is the reversibility of the diffusion process, and some past approaches to this concept are reviewed. A coalescent model of evolution is used to model the ancestry of a sample of DNA sequences which have the selected site segregating. A novel method for simulating the coalescent with selection, acting at a single biallelic site, is described. Selection is incorporated through modelling the frequency of the selected and neutral allelic classes stochastically back in time. The ancestry is then simulated using a subdivided population model considering the population frequencies through time as variable population sizes. The approach is general and can be used for any selection scheme at a biallelic locus. The mutation model, for the selected and neutral sites, is the infinitely-many-sites model where there is no back or parallel mutation at sites. This allows a unique perfect phylogeny, a gene tree, to be constructed from the configuration of mutations on the sample sequences. An importance sampling algorithm is described to explore over coalescent tree space consistent with this gene tree. The method is used to assess the evidence for selection in a number of data sets. These are as follows: a partial selective sweep in the G6PD gene (Verrelli et al., 2002); a recent full sweep in the Factor IX gene (Harris and Hey, 2001); and balancing selection in the DCP1 gene (Rieder et al., 1999). Little evidence of the action of selection is found in the data set of Verrelli et al. (2002) and the data set of Rieder et al. (1999) seems inconsistent with the model of balancing selection. The patterns of diversity in the data set of Harris and Hey (2001) offer support of the hypothesis of a full sweep.