Title:
|
Towards data-intensive epidemiology : explorations in systematic reviews and causal inference
|
The field of epidemiology is now experiencing a data deluge, demanding appropriate
methods to efficiently analyse large amounts of data. In this thesis we present advances
towards data-intensive epidemiology, introducing novel methods and applications of
data mining in this field. We focus on two distinct applications.
Our first application is the task of risk of bias assessments of systematic reviews.
At present these are a highly manual process, where reviewers identify relevant parts
of research articles for a set of methodological elements that affect the risk of bias, in
order to make a risk of bias judgement for each of these elements. We use text mining to
identify relevant sentences within the text of included articles, to rank articles by risk of
bias, and to reduce the number of risk of bias assessments the reviewers need to perform
by hand.
The application of text mining to risk of bias assessments also led to the following
methodological contributions. We introduce the concept of a rate-constrained ranking
task, of which ranking articles for rapid reviews is an example. We derive a novel metric,
the rate-weighted area under the ROC curve (rAVC) , to evaluate ranking models for
rate-constrained ranking tasks. Furthermore, we derive a method to generate confidence
bounds around ROC curves, that is particularly appropriate for these types of tasks.
Our second application is the task of choosing hypotheses to test in epidemiological
analyses. Currently researchers use prior knowledge about the composition of causal
pathways, and their own research interests and preconceptions, to decide which hypotheses
to test. Where no strong priors exist it may be preferable to use a systematic
approach to identify those to follow up. We present a novel screening step that uses
Mendelian randomisation to systematically search a large number of hypotheses for potentially
causal relationships that should be investigated further. As an exemplar we
search for the causal effects of body mass index (BMI) and find many associations with
outcomes that are supported in the literature.
|