Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.754965
Title: Some approaches to graph fragmentation with application to clustering geo-tagged data
Author: Vu, Ngoc Tuan
ISNI:       0000 0004 7427 9809
Awarding Body: King's College London
Current Institution: King's College London (University of London)
Date of Award: 2018
Availability of Full Text:
 Access from EThOS: Access from Institution:
Abstract:
The basic topic of this study is a graph algorithm that decomposes graphs which we call graph fragmentation algorithms. The aim is to derive fast algorithms to break an input graph into connected subgraphs which we call fragments. The original motivation for the research came from data mining geo-tagged photographs from Flickr. Plotting these geo-tags sometimes reveals recognisable patterns of cities, countries or continents made purely of dots. Joining those dots within a limited distance of each other makes a graph many of whose components are cities, attractions, etc. which are densely connected. A question is then: how can one break the graph into fragments in such a way that its component structure is preserved? We give a simple example of a graph fragmentation algorithm. An active vertex v is selected. Next, v selects an active neighbour u, if any, and v absorbs u by making u inactive and orienting the edge uv from u to v. If v has no active neighbours then v points to itself, and becomes an inactive root. The process continues until all vertices become inactive. In the end, the fragments - the objects of interest, are formed by directed paths pointing to the root vertices of the components. Fragmentation algorithms can be varied by altering the selection operation. A major difference between the algorithms is how vertices are selected i.e. probabilistically, deterministically or heuristically. For cycle graphs, we make a formal analysis of the various fragmentation algorithms. We also study a variation in which edges are selected instead of vertex, and extend our analysis to circulant graphs. Many fragmentation algorithms are based on assigning every vertex a unique oriented out-edge, in which case the subgraph obtained consists of uncyclic components. This generalises the subgraph formed by the well-known random mapping graph model to which we draw some comparisons. We next introduce another fragmentation model, the permutation subgraph model. The vertices of the graph are permuted and examined in permutation order. Starting from the beginning of the permutation, each vertex points to its first neighbour to the right of it in the permutation, or to itself if no such neighbour exists. Permutation subgraphs are studied in more detail for a wider class of graph models including r-regular graphs, random graphs and infinite random graphs on the integers. Inspired by the interest in triangles in social networks, we also investigate another variation called triangle-fragmentation, in which every vertex points to the neighbour with which has the highest number of common neighbours. Although not a linear time algorithm in general, it seems it might be suitable for decomposing dense graphs. The algorithm is analysed experimentally on planted l partition model and random geometric graphs. It is also evaluated as clustering algorithm on a number of real-world graphs including social networks and graphs formed by geo-tagged photographs taken from Flickr.