Title:

Investigations into the robustness of statistical decisions

Decision theory is a cornerstone of Statistics, providing a principled framework in which to act under uncertainty. It underpins Bayesian theory via the Savage axioms, game theory via Wald's minimax, and supplies a mathematical formulation of 'rational choice'. This thesis argues that its role is of particular importance in the socalled 'big data' era. Indeed, as data have become larger, statisticians are confronted with an explosion of new methods and algorithms indexing ever more complicated statistical models. Many of these models are not only highdimensional and highly nonlinear, but are also approximate by design, e.g. deliberately making approximations for reasons such as tractability and interpretation. For Bayesian theory, and for Statistics in general, this raises many important questions, which I believe decision theory can help elucidate. From a foundational standpoint, how does one interpret the outputs of Bayesian computations when the model is known to be approximate and misspecified? Concerns of misspecification violate the necessary assumptions for the use of the Savage axioms. Should principles such as expected loss minimisation apply in such settings? On a practical level, how can modellers assess the extent of the impact of model misspecification? How can this be integrated into the process of model construction in order to inform the user whether more work needs to be done (for example, more hours of computation, or a more accurate model)? They need to know whether the model is unreliable, or whether the conclusions of the model are robust and can be trusted. In the history of Robust Statistics, whose main aspects are covered in Chapter 1, there has been periodic concern with misspecification. Robust Bayesian analysis was a particularly active area of research through the 1980s to mid90s, but later declined due to methodological and computational advances which overcame original concerns of misspecification. Now, however, the complexity of datasets frequently prohibits the possibility of constructing fully specified and wellcrafted models and therefore Bayesian robustness merits a reappraisal. Additionally, new methods have been developed which are characterised by their deliberately approximate and misspecified nature, such as integrated nested Laplace approximation (INLA), approximate Bayesian computation (ABC), Variational Bayes, and composite likelihoods. These all start with a premise of misspecification. The work described in this thesis concerns the development of a comprehensive framework addressing challenges associated with imperfect models, encompassing both formal methods to assess the sensitivity of the model (Chapters 2 & 3), and diagnostic exploratory methods via graphical plots and summary statistics (Chapters 4 & 5). This framework is built on a post hoc sensitivity analysis of the posterior approximating model via the loss function. Chapter 2 describes methods for estimating the sensitivity of a model with respect to the loss function by analysing the effect of local perturbations in neighbourhoods centred at the approximating model (in a Bayesian context this would be the posterior distribution). These neighbourhoods are defined using the KullbackLeibler divergence. This approach provides a bridge between the two dominant paradigms in decision theory: Wald's minimax and Savage's expected loss criterion. Two key features of this framework are that the solution is analytical, and it unifies other well known methods in Statistics such as predictive tempering, power likelihoods and Gibbs posteriors. It also offers an interesting solution to the Ellsberg paradox. Another application of the work is in the area of computational decision theory where the statistician only has access to the model via a finite set of samples. In this context, the methods can be used at very little extra computational cost. Chapter 3 considers nonparametric extensions to the approximating reference model. In particular, we look at the Pólya tree process, the Dirichlet process and bootstrap procedures. Again using the KullbackLeibler divergence, it is possible to characterise random samples of these nonparametric models with respect to the base model, and therefore understand the effect of local perturbations on the distribution of loss of the approximating model. A series of diagnostic plots and summary statistics are presented in Chapter 4, and further illustrated in Chapter 5 by means of two applications taken from the medical decisionmaking literature. These complete the framework of posthoc assessment of model stability and allow the user to understand why the model might be sensitive to misspecification. Graphical displays are an essential part of statistical analyses, indeed the point of departure for any serious data analysis. Their use in model exploration in the context of decision theory, however, is not common. We borrow some ideas from finance and econometrics as a basis of exploratory decisionsystem plots. Other plots come as natural consequences of the methodology from the two previous chapters. The final chapter examines a very specific application of statistical decision theory, notably the analysis of randomised clinical trials to assess the evidence in favour of patient heterogeneity. This problem, known as subgroup analysis, has traditionally been solved using predictive models which are a proxy for the real object of interest: evidence of patient heterogeneity. By formally expressing the decision problem as a hypothesis test, and working from first principles, the problem is shown to be in fact much easier than previously thought. The method avoids issues involving counterfactuals by testing decision rules against their mirror images. It can harness the strength of well known model free tests and uses a random foresttype approach for posthoc exploration of decision rules. The randomisation allows for a causal interpretation of the results.
