Use this URL to cite or link to this record in EThOS:
Title: Inferring information about correspondences between data sources for dataspaces
Author: Guo, Chenjuan
ISNI:       0000 0004 2713 9619
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
Access from Institution:
Traditional data integration offers high quality services for managing and querying interrelated but heterogeneous data sources but at a high cost. This is because a significant amount of manual effort is required to help specify precise relationships between the data sources in order to set up a data integration system. The recent proposed vision of dataspaces aims to reduce the upfront effort required to set up the system. A possible solution to approaching this aim is to infer schematic correspondences between the data sources, thus enabling the development of automated means for bootstrapping dataspaces. In this thesis, we discuss a two-step research programme to automatically infer schematic correspondences between data sources. In the first step, we investigate the effectiveness of existing schema matching approaches for inferring schematic correspondences and contribute a benchmark, called MatchBench, to achieve this aim. In the second step, we contribute an evolutionary search method to identify the set of entity-level relationships (ELRs) between data sources that qualify as entity-level schematic correspondences. Specifically, we model the requirements using a vector space model. For each resulting ELR we further identify a set of attribute-level relationships (ALRs) that qualify as attribute-level schematic correspondences. We demonstrate the effectiveness of the contributed inference technique using both MatchBench scenarios and real world scenarios.
Supervisor: Fernandes, Alvaro Sponsor: CSC ; DIUS ; School of Computer Science in the University of Manchester
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Schematic correspondences ; Dataspaces