Multidimensional aggregation in OLAP systems
On-line analytical processing (OLAP) provides multidimensional data analysis to support decision making. OLAP queries require extensive computation based on aggregation along many dimensions and hierarchies. The time required to process these queries has traditionally prevented the interactive analysis of large databases and in order to accelerate query-response time, precomputed results are often stored as materialised views for later retrieval. This adds a prohibitive storage overhead when applied to the whole set of aggregates, known as the data cube. Storage space and computation time can be significantly reduced by partial computation. The challenge in implementing the data cube has been to select the minimum number of views for materialisation, while retaining fast query response time. This thesis makes significant contributions to this area by introducing the Low Redundancy (L-R) approach which provides the means for the selection, computation and storage of nonredu ndant aggregates. Firstly, through the introduction of a novel technique, redundant aggregates are identified thus allowing only distinct aggregates to be computed and stored. Secondly, further redundancy is identified and eliminated using a second novel technique which stores these distinct aggregates in a compact differential form. Novel algorithms were introduced to implement these techniques and provide a solution which is both scalable and low in complexity. Both techniques have been evaluated using real and synthetic datasets with experimental results, and have achieved significant savings in computation time and storage space compared to the conventional approach. Savings have been shown to increase as dimensionality increases. Existing techniques for implementing the data cube differ from the L-R approach but they can be integrated with it to achieve faster query-response time. Finally, the implications of this work reach beyond the area of OLAP to the fields of decision support systems, user interfaces and data mining.