Use this URL to cite or link to this record in EThOS:
Title: Handling missing data in analyses of the UK women's cohort study
Author: Nur, Ula Ali Mohamed
ISNI:       0000 0001 3450 6905
Awarding Body: University of Leeds
Current Institution: University of Leeds
Date of Award: 2004
Availability of Full Text:
Access from EThOS:
Access from Institution:
Missing values are a problem in large-scale surveys with extensive questionnaires. The analysis of the complete records may yield inferences substantially different from those that would be obtained had no data been missing. The aim of this dissertation is to critically examine ways of handling missing data in the UK Women Cohort Study (UKWCS). This is a large dataset with continuous, categorical and binary variables with missing values in almost every variable. A number of simple imputation techniques, as well as multiple imputation developed by Rubin (1987), and multiple imputation by chained equations using the Gibbs sampling (Van Buuren, 1999), were explored in a number of illustrative analyses associated with the UKWCS. Three approaches of handling missing dietary information on alcohol consumption were compared. The comparison shows that ignoring missingness by analysing only complete cases produces bias (lower means). Imputing an extreme value zero as is customary at present, underestimates the actual alcohol consumption, it also incorrectly increases the apparent precision of estimation (i. e. inappropriately small standard errors). A published study, Pollard et al, (2001) which based its conclusion on one third of the records was replicated after handing missing data by multiple imputation. Multiple imputation by chained equations, an iterative technique, which deals with missing values when every variable is incomplete, was applied. This method greatly improved the results by utilizing most of the information in the incomplete records. The method has the advantage that the algorithm intended for analysing the complete data is applied several times, without any alterations. The implications of missing data were also studied in a survival analysis, investigating the link between incidence of breast cancer and a number of prognostic factors. The thesis recommends multiple imputation for handling missing data, by which most of the information in the dataset is exploited, and helps in efficient inferences to be made from subsequent analyses.
Supervisor: Greenwood, D. ; Longford, N. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available