Use this URL to cite or link to this record in EThOS:
Title: A probabilistic approach to uncertainty quantification in pay-as-you-go data integration
Author: Sanchez Serrano, Fernando Rene
ISNI:       0000 0004 8501 4507
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
The use of Web standards, compact publication guidelines, and open data initiatives have motivated many public and private organisations to publish data on the Web, giving rise to a global data space. Consuming data from heterogeneous data sources published on the Web requires integration at scale. The pay-as-you-go approach to data integration (PAYG) addresses integration at scale, relying on automatic techniques to provide candidate integrations. The high reliance on automatic techniques gives rise to uncertainty. Uncertainty may arise and propagate to all the tasks of the life cycle of a PAYG approach whose effect may be manifested in the quality of an automatically generated integration. Quantifying the uncertainty on the outcomes of a bootstrapped integration is a crucial task that can help in understanding the decisions made by the automatic algorithms, aiming to reduce such uncertainty that ultimately can improve the quality of an integration. In this thesis, we address the issue of quantifying the uncertainty that arises during the bootstrapping phase of PAYG in the context of Dataspaces. In particular, two approaches are proposed: (i) an approach to quantify the uncertainty in mapping generation using internal evidence; (ii) an approach to quantify the uncertainty on the quality of an entire integration using user feedback in a pay-as-you-go manner. More specifically, this thesis makes the following contributions: (i) a principled methodology to derive degrees of belief on mappings that builds on Bayesian infer- ence to assimilate evidence in the form of fitness scores associated to mappings during mapping generation; (ii) a novel methodology to quantify the uncertainty on the quality of an entire integration by assimilating user feedback on tuple results; (iii) an experi- mental evaluation of the proposed techniques on a real-world integration scenario. The experimental evaluation of the contributed techniques presented in this dis- sertation provides empirical evidence of their cost-effectiveness, when applied in syn- thetic and real-world scenarios, in quantifying the quality of a pay-as-you-go data in- tegration.
Supervisor: Paton, Norman ; Fernandes, Alvaro Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: data integration ; pay as you go ; probabilistic approach ; uncertainty quantification