Use this URL to cite or link to this record in EThOS:
Title: On techniques for pay-as-you-go data integration of linked data
Author: Christodoulou, Klitos
ISNI:       0000 0004 5356 3544
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Access from Institution:
It is recognised that nowadays, users interact with large amounts of data that exist in disparate forms, and are stored under different settings. Moreover, it is true that the amount of structured and un-structured data outside a single well organised data management system is expanding rapidly. To address the recent challenges of managing large amounts of potentially distributed data, the vision of a dataspace was introduced. This data management paradigm aims at reducing the complexity behind the challenges of integrating heterogeneous data sources. Recently, efforts by the Linked Data (LD) community gave rise to a Web of Data (WoD) that interweaves with the current Web of documents in a way that it is useful for data consumption by both humans and computational agents. On the WoD, datasets are structured under a common data model and published as Web resources following a simple set of guidelines that enables them to be linked with other pieces of data, as well as, to be annotated with useful meta data that help determine their semantics. The WoD is an evolving open ecosystem including specialist publishers as well as community efforts aiming at re-publishing isolated databases as LD on the WoD, and annotating them with meta data. The WoD raises new opportunities and challenges. However, currently it mostly relies on manual efforts for integrating the large amounts of heterogeneous data sources on the WoD. This dissertation makes the case that several techniques from the dataspaces research area (aiming at on-demand integration of data sources in a pay-as-you-go fashion) can support the integration of heterogeneous WoD sources. In so doing, this dissertation explores the opportunities and identifies the challenges of adapting existing pay-as-you-go data integration techniques in the context of LD. More specifically, this dissertation makes the following contributions: (1) a case-study for identifying the challenges when existing pay-as-you-go data integration techniques are applied in a setting where data sources are LD; (2) a methodology that deals with the 'schema-less' nature of LD sources by automatically inferring a conceptual structure from a given RDF graph thus enabling downstream tasks, such as the identification of matches and the derivation of mappings, which are, both, essential for the automatic bootstrapping of a dataspace; and (3) a well-defined, principled methodology that builds on a Bayesian inference technique for reasoning under uncertainty to improve pay-as-you-go integration. Although the developed methodology is generic in being able to reason with different hypothesis, its effectiveness has only been explored on reducing the uncertain decisions made by string-based matchers during the matching stage of a dataspace system.
Supervisor: Not available Sponsor: Engineering and Physical Sciences Research Council (EPSRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: linked data ; data integration ; uncertainty management