Tree algorithms for mining association rules
With the increasing reliability of digital communication, the falling cost of hardware
and increased computational power, the gathering and storage of data has become
easier than at any other time in history. Commercial and public agencies are able to
hold extensive records about all aspects of their operations. Witness the proliferation
of point of sale (POS) transaction recording within retailing, digital storage of
census data and computerized hospital records. Whilst the gathering of such data
has uses in terms of answering specific queries and allowing visulisation of certain
trends the volumes of data can hide significant patterns that would be impossible to
locate manually. These patterns, once found, could provide an insight into customer
behviour, demographic shifts and patient diagnosis hitherto unseen and unexpected.
Remaining competitive in a modem business environment, or delivering services in
a timely and cost effective manner for public services is a crucial part of modem
economics. Analysis of the data held by an organisaton, by a system that "learns"
can allow predictions to be made based on historical evidence. Users may guide the
process but essentially the software is exploring the data unaided.
The research described within this thesis develops current ideas regarding the exploration
of large data volumes. Particular areas of research are the reduction of
the search space within the dataset and the generation of rules which are deduced
from the patterns within the data. These issues are discussed within an experimental
framework which extracts information from binary data.