Title:
|
Local pattern mining in multi-relational data
|
Multi-relational data mining has so far been synonym to methods based on Inductive Logic
Programming (ILP) , which discover frequent first-order logic rules in the data. This is due to
the fact that ILP conveniently captures the multi-relational structure, while there has not been
a suitable pattern syntax extension of an itemset for the case of multi-relational data. Local
pattern mining methods have mostly focused on mining a single relation. A common strategy
for mining multi-relational data (MRD) has been to apply frequent items et mining on the join
of all database relations. However, when flattening the data in this way, important structural
information is lost and itemsets do not capture all the associations in the data.
This thesis describes our research that led to a new approach for local pattern mining in
multi-relational data. The final result of this research is summarised as follows. We define the
new pattern syntax of Maximal Complete Connected Subsets (MCCSs) for MRD with binary
relations, which captures well the structure of the original data. We additionally propose the
generalisation of MCCSs, called N-MCCSs, for MRD containing relations of any arity. We
demonstrate how N-MCCSs contain tiles [27] and n-sets [16] as special cases. Furthermore, we
propose RMiner, an efficient algorithm to mine MCCSs and N -RMiner an efficient algorithm to
mine N-MCCSs. We show experimentally that N-RMiner, while applicable to MRD in general,
when applied to a Single n -ary relation, considerably outperforms the state of the art algorithm
for mining n-sets [16] on real world datasets.
Finally, this work is incorporated into a general data mining framework for quantifying the
subjective interestingness of patterns based the prior information of the user.
|