Data integration and query decomposition in distributed databases
Preci* is a generalised distributed database management system, capable of supporting heterogeneous, pre-existing databases as nodes. The system is fully decentralised, supporting both retrieval and update of the data. Varying degrees of location transparency can be provided, according to user requirements. The work presented here is concerned with data integration and query decomposition. An extended relational algebra (PAL) is developed, which serves both as a query language and as a mapping language for data integration. The suitability of PAL for data integration is demonstrated by a number of examples, and by comparison with existing proposals. A major attraction of PAL is that it can also be used as a query language, thereby making query decomposition much easier. The relational algebraic approach is shown to be particularly appropriate for query decomposition, since queries can be easily parsed and represented in tree form. Such parse trees are readily transformed to yield equivalent expressions which will execute more efficiently. An algorithm is given for decomposing global PAL queries into nodal subqueries, and for coordinating their execution. The general problem of allocating subqueries to execution nodes is not tackled, though it is shown that the algorithm will do this allocation under specific implementation conditions. A prototype of Preci* has been implemented in 'C'.