Dealing with missing data on alcohol consumption using diet diaries in a birth cohort study
Recent alcohol research has focussed on the importance of patterns of drinking rather than on total consumption over a period. This requires collection of detailed data, as in a daily diary, with a resulting tendency for a substantial proportion of missing data. In the past, dealing with missing data in epidemiology was based mainly on naive methods. The aim of this dissertation is to critically examine ways of dealing with missing data on alcohol consumption collected in diet diaries by the 1946 birth cohort study, and to develop a method which takes account of both the technical statistical problems which arise with such data and the characteristics of the data which are of substantive importance in alcohol research. Recent developments in standard statistical software packages (SPSS, S-Plus), and special-purpose packages for missing data analysis (such as SOLASTM), have given epidemiologists access to more sophisticated approaches such as propensity score, linear regression, EM algorithm and methods of multiple imputation. These methods are evaluated using a simulation-based approach, which demonstrates that ignoring missing data, or handling them incorrectly, can lead to inefficient and biased results. A technical problem arises because the distribution of alcohol consumption is semicontinuous. The results show some standard methods are not suitable for variables of this kind, some use inappropriate algorithms, whilst others are not appropriate for epidemiological research because they do not preserve relationships between variables. Single or deterministic imputation methods fail to take account of uncertainty about the missing values. The thesis shows how, using Schafer's procedures for multiple imputation, the information in alcohol diary data can be fully exploited and efficient inferences made. The multiply imputed datasets can be used for any subsequent analysis. Examples used in this thesis are the prevalence of excessive alcohol consumption, the role of alcohol consumption in the relationship between birthweight and blood pressure in mid-life and the dependence of blood pressure on alcohol consumption. Any method of dealing with missing data should evaluate the sensitivity of inferences to its assumptions. In this thesis the sensitivity of inferences to the MAR assumption and to the model for imputation is evaluated.