Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553490
Title: Bayesian mixture models for frequent itemset mining
Author: He, Ruofei
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2012
Availability of Full Text:
Access through EThOS:
Access through Institution:
Abstract:
In binary-transaction data-mining, traditional frequent itemset mining often produces results which are not straightforward to interpret. To overcome this problem, probability models are often used to produce more compact and conclusive results, albeit with some loss of accuracy. Bayesian statistics have been widely used in the development of probability models in machine learning in recent years and these methods have many advantages, including their abilities to avoid overfitting. In this thesis, we develop two Bayesian mixture models with the Dirichlet distribution prior and the Dirichlet process (DP) prior to improve the previous non-Bayesian mixture model developed for transaction dataset mining. First, we develop a finite Bayesian mixture model by introducing conjugate priors to the model. Then, we extend this model to an infinite Bayesian mixture using a Dirichlet process prior. The Dirichlet process mixture model is a nonparametric Bayesian model which allows for the automatic determination of an appropriate number of mixture components. We implement the inference of both mixture models using two methods: a collapsed Gibbs sampling scheme and a variational approximation algorithm. Experiments in several benchmark problems have shown that both mixture models achieve better performance than a non-Bayesian mixture model. The variational algorithm is the faster of the two approaches while the Gibbs sampling method achieves a more accurate result. The Dirichlet process mixture model can automatically grow to a proper complexity for a better approximation. However, these approaches also show that mixture models underestimate the probabilities of frequent itemsets. Consequently, these models have a higher sensitivity but a lower specificity.
Supervisor: Shapiro, Jonathan Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.553490  DOI: Not available
Keywords: Frequent itemset mining ; Mixture model ; Bayesian statistics ; Dirichlet Process ; EM algorithm ; Gibbs sampling
Share: