Use this URL to cite or link to this record in EThOS:
Title: Identification of data structure with machine learning : from Fisher to Bayesian networks
Author: Casana Eslava, R.
ISNI:       0000 0004 7964 2953
Awarding Body: Liverpool John Moores University
Current Institution: Liverpool John Moores University
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis proposes a theoretical framework to thoroughly analyse the structure of a dataset in terms of a) metric, b) density and c) feature associations. To look into the first aspect, Fisher's metric learning algorithms are the foundations of a novel manifold based on the information and complexity of a classification model. When looking at the density aspect, the Probabilistic Quantum clustering, a Bayesian version of the original Quantum Clustering is proposed. The clustering results will depend on local density variations, which is a desired feature when dealing with heteroscedastic data. To address the third aspect, the constraint-based PC-algorithm is the starting point of many structure learning algorithms, it is focused on finding feature associations by means of conditional independent tests. This is then used to select Bayesian networks, based on a regularized likelihood score. These three topics of data structure analysis were fully tested with synthetic data examples and real cases, which allowed us to unravel and discuss the advantages and limitations of these algorithms. One of the biggest challenges encountered was related to the application of these methods to a Big Data dataset that was analysed within the framework of a collaboration with a large UK retailer, where the interest was in the identification of the data structure underlying customer shopping baskets.
Supervisor: Jarman, I. H. ; Lisboa, P. J. ; Ortega-Martorell, S. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: QA75 Electronic computers. Computer science