Use this URL to cite or link to this record in EThOS:
Title: Bayesian methods for hierarchical clustering and community discovery
Author: Blundell, C.
ISNI:       0000 0004 5359 3946
Awarding Body: University College London (University of London)
Current Institution: University College London (University of London)
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Discovering clusters in data is a common goal of statistical data analysis. Two kinds of clustering, hierarchical clustering and community discovery, are considered here, as well as their composition-discovering hierarchies of communities. Hierarchical clustering discovers hierarchies of clusters of data, represented as a tree whose leaves are the data. Community discovery finds clusters of people, most commonly from the adjacency matrix of a graph of the relationships between the people. We shall leverage Bayesian statistics to construct several models and corresponding efficient learning algorithms for discovering hierarchies, communities and hierarchies of communities. This thesis has three main contributions, each being a model and a learning algorithm for tackling one of these clustering problems. First we develop an efficient model-based hierarchical clustering algorithm using greedy model selection. Unlike many other hierarchical clustering algorithms, our model is not necessarily a binary tree, but can be any tree where each internal node may have any number of children. This can lead to simpler hierarchies that we find are just as predictive of the data, but are more interpretable as the hierarchies are less visually cluttered and the underlying model has fewer parameters than a binary tree-based model. We then adapt this hierarchical clustering model and algorithm to discovering communities in social networks based upon their adjacency matrix, where the leaves of the discovered tree correspond to people. This adaptation is not straightforward as a naive adaptation leads to an inefficient learning algorithm. We develop a dynamic programming scheme and number of approximations that yield several fast algorithms. We then show empirically that these approximations are faster than the Infinite Relational Model, producing similar or better predictions in less time for the task of predicting unobserved edges in a graph. Finally we tackle the problem of discovering communities directly from interactions among individuals, rather than from the adjacency matrix of a graph. We develop a model that uses a statistical notion of reciprocity to discover communities from time-series interaction data. We then develop a Markov Chain Monte Carlo method for inference and show empirically that this model is much better at predicting future interactions among individuals than several alternate models.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available