Use this URL to cite or link to this record in EThOS:
Title: Disease surveillance using user-generated content
Author: Zou, Bin
ISNI:       0000 0004 7660 808X
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Disease surveillance plays a crucial role in detecting or anticipating infectious disease outbreaks. It tracks health-related data from a population to identify and monitor early outbreaks of a disease. Traditional disease surveillance requires a widespread network of sentinel sites to track infections throughout the population. These networks are time and labour intensive to build and maintain, and this creates opportunities for utilizing online user-generated content. Compared to traditional data sources, online user-generated content is fast and cheap to obtain. It covers a larger population, and provides data on topics with little coverage from traditional sources. This can complement traditional disease surveillance systems. In this thesis, we focus on improving disease surveillance using online user-generated content, through machine learning and natural language processing techniques. Our contributions are threefold. First, a feature selection method, which consists of a time series similarity filter and a topic filter, is proposed. The former filter ensures the selected features are good predictors, while the topic filter succeeds in eliminating features that may be highly correlated with disease rates, but are not referring to the target disease. Second, a multi-task learning framework for disease surveillance is proposed, where several disease surveillance models are jointly trained. Multi-task elastic net and multi-task Gaussian Processes are used for regression. The framework improves the generalization of a model by taking advantage of shared structures in the data. Third, a transfer learning framework is proposed for delivering accurate disease rate models without the existence of ground truth information for a target location. The framework consists of three steps: (1) learn a regularized regression model for a source country, (2) map the source queries to target ones using semantic and temporal similarity metrics, and (3) re-adjust the weights of the target queries. To support the theoretical derivations, extensive and repeatable experiments are carried out based on large-scale real-world data. Experimental results have demonstrated substantial improvement of the proposed solutions over strong baselines. In addition, we publish a website that reports real-time flu rate estimation in England (
Supervisor: Cox, I. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available