Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.618550
Title: Maximum entropy modelling for quantifying unexpectedness of data mining results
Author: Kontonasios , Kleanthis-Nikolaos
Awarding Body: University of Bristol
Current Institution: University of Bristol
Date of Award: 2013
Availability of Full Text:
Access through EThOS:
Abstract:
This thesis is concerned with the problem of finding subjectively interesting patterns in data. The focus is restricted to the most prominent notion of subjective interestingness, namely the unexpectedness of a pattern. A pattern is considered unexpected if it contradicts the user's prior knowledge or beliefs about the data. Recently, a general information-theoretic framework for data. mining that naturally incorporates unexpectedness was devised. The proposed approach relics on: 1. the Maximum Entropy principle for encoding the user's prior knowledge about the data or the patterns, 2. the InfRatio measure, an information-theoretic measure for evaluating the unexpectedness of a pattern and 3. a set covering algorithm for finding the most interesting set of patterns. However, this framework is intentionally phrased in abstract terms and formally applied only for limited types of data mining tasks. This thesis is meant to fill this gap, as its main contribution is the formalization of this general framework to specific data mining tasks in order to demonstrate the wide applicability of the framework ill practice. In particular, we instantiate the three main components of the framework ill order to evaluate frequent item.set.li, clusterings and patterns found in real-valued data such as biclusters and subgroups. Additionally, we provide the first literature review of interestingness mea- sures based on unexpectedness and propose a novel classification of the methods into two classes, namely the "syntactical" and "probabilistic" approaches. We show that exploiting the framework for finding subjectively interesting sets of patterns in data is a highly efficient practice in theoretical, algorithmic and computational terms.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.618550  DOI: Not available
Share: