Use this URL to cite or link to this record in EThOS:
Title: Modelling and computing the quality of scientific information on the Web of Data
Author: Gamble, Matthew Philip
ISNI:       0000 0004 5352 6516
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
The Web is being transformed into an open data commons, and is now the dominant point of access for information seeking scientists. In parallel the scientific community has been required to manage the challenges of "Big Data" - characterized by its large-scale, distributed, and diverse nature. The Web of Linked Data has emerged as a platform through which the sciences can meet this challenge, allowing them to publish and reuse data in a machine readable manner. The openness of the Web of Data is however a double-edged sword. On one hand it drives a rapid growth of adoption, but on the other a lack of governance and quality control has led to data of varied quality and trustworthiness. The challenge scientists face then is not that data on the Web is universally poor, but that the quality is unknown. Previous research has established the notion of Quality Knowledge, latent domain knowledge possessed by expert scientists to make quality based decisions. The main idea pursued in this thesis is that we can address Information Quality (IQ) issues in the Web of Data by repurposing these existing mechanisms scientists use to evaluate data. We argue that there are three distinct aspects of Quality Knowledge, objective, predictive, and subjective, defined by information required for their assessment, and present two studies focused on the modelling and exploitation of the objective and predictive aspects. We address the objective aspect by developing the Minimum Information Model as a repurposing of Minimum Information Checklists, an increasingly prevalent type of quality knowledge employed in the Life Sciences. A more general approach to modelling the predictive aspect explores the use of Multi-Entity Bayesian Networks to tackle the characteristic uncertainty in predictive quality knowledge, and the inconsistent availability of metadata in the Web of Data. We show that by following our classification we can develop techniques and infrastructure to successfully evaluate IQ that are tailored to the challenges of the Web of Data, and informed by the needs of the scientific community.
Supervisor: Goble, Carole Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available