Title:

Optimal strategies for estimating cosmological parameters

In this thesis we study the effects of observational selection bias on the estimation of galaxy distances in cosmology. Although the presence of systematic bias in magnitudelimited surveys has long been recognised there remains disagreement in the literature as to precisely how best to reduce or eliminate its effects from redshiftindependent distance estimates. The aim of this thesis is to develop a statistically rigorous formulation of the problem of distance estimation, so as to resolve some of the issues which have clouded past discussion and allow one to determine strategies for obtaining optimal distance estimators. Redshiftindependent distance estimates, when combined with the measured redshift, provide an estimate of a galaxy's peculiar velocity. The study of the largescale peculiar velocity field has been a very active and contentious subject in recent years, following a number of independent reports of coherent structure and velocity flows on very large scales which pose serious problems for popular theories of structure formation. In chapter (1) we present a brief overview of our current picture of the local universe and summarise the basic features of theoretical models for the formation and evolution of structure. We compare in detail two different analytical techniques which have been developed to recover the full peculiar velocity and density fields from redshift surveys: the POTENT method (Bertschinger and Dekel, 1989) and the 'IRAS' method (Strauss et al, 1990). The former method requires redshiftindependent distance estimates and we consider the effects of sparse and noisy sampling on the recovered density and velocity fields, demonstrating the advantages for POTENT of removing the effects of selection bias from distance estimates. Chapter (2) presents a detailed description of distance indicators currently used in cosmology. We review previous analyses of distance estimation and biasing problems and discuss the limitations of the 'Minimum Bias Subset', an early method proposed to remove them. We examine the different linear regression techniques used to calibrate indicators such as the TullyFisher relation and address the question of which method is 'best'. In particular, we consider a scheme, proposed by Schechter (1980), for obtaining unbiased distance estimates provided that one's sample is subject only to luminosity selection. This scheme has not been universally endorsed in the literature and many authors prefer other calibration methods. This disagreement is one of the main issues which we aim to clarify and resolve in this thesis. In chapter (3) we introduce a formulation for defining and investigating the properties of distance estimators in a statistically rigorous fashion. We firstly consider the case where distances are estimated using only measurements of apparent magnitude. Assuming a Gaussian luminosity function we derive expressions for the conditional distribution of observable galaxies at a given (though in general unknown) true distance and use this distribution to define a number of different distance estimators and compare their distributions, bias and mean squared error or risk as a function of true distance. This simple case is used to illustrate useful criteria by which we can identify which estimator is 'best'. In this chapter we also describe a procedure for constructing confidence intervals for the true distance of a galaxy. In chapter (4) we extend our analysis to the case where distance estimates are made from measurements of two or more observables, accounting for the effects of selection bias. We show that the different methods of regression used to calibrate these relations correspond to simple distance estimators which arise naturally from our rigorous formulation. We also define a 'maximum likelihood' distance estimator and, following the method introduced in chapter (3), we compute the distribution, bias and risk of all of these estimators as a function of true distance. These results allow us to test the validity of the 'Schechter' scheme and identify situations where the corresponding estimator is a poor choice. Finally, we extend our procedure for constructing confidence intervals to this twoobservable case. In chapter (5) we consider distance estimators constructed from a linear combination of three observables  again including the effects of selection. By computing the distribution, bias and risk of these estimators we determine, in particular, whether one may still define unbiased distance estimates in this case by adapting the 'Schechter' scheme. Furthermore, we examine quantitatively the extent to which the addition of a given third observable improves distance estimates obtained from measurements of only two. We consider the importance of these results for e. g. the Dnd relation, for which potentially useful third observables exist. In chapter (6) we summarise the main qualtitative results of this thesis and explore a number of possible avenues for future work; in particular a study of the consequences of our results for the analysis of redshift surveys, by reconstruction methods such as POTENT.
