Use this URL to cite or link to this record in EThOS:
Title: Query answering in distributed RDF databases
Author: Potter, Anthony
ISNI:       0000 0004 7231 6637
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
To simplify data integration and exchange, modern applications often represent their data using the Resource Description Framework (RDF). As the amount of the available data keeps increasing, many RDF datasets cannot be processed using centralised RDF stores. A common solution is to distribute RDF data in a cluster of shared-nothing servers, and to query the data using a distributed query algorithm. Existing approaches typically use a variant of the data exchange operator to shuffle partial query answers between servers and thus ensure that every query answer is produced. Decisions as to when and where to shuffle the data are usually made statically - that is, at query compile time. In this thesis, we argue that such approaches can miss opportunities for local computation and thus incur considerable overheads. Moreover, we present a novel distributed query evaluation algorithm for RDF based on dynamic data exchange, where all computation that can be done locally is guaranteed to be performed on a single server. Our approach can successfully process any query even if the memory available at each server is bounded, and we argue that this is critical in distributed systems where intermediate results can easily exceed the capacity of each server. We also present a new query planning approach that balances the cost of communication against the cost of local processing at each server, as well as a new approach to partitioning RDF data that aims to increase locality in each server. We have implemented our approach in the well-known RDFox data store, and our empirical evaluation suggests that our techniques can outperform the state of the art by orders of magnitude in terms of query evaluation times, network communication, and memory use.
Supervisor: Horrocks, Ian ; Motik, Boris Sponsor: Engineering and Physical Sciences Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available