Title:

Some approaches to graph fragmentation with application to clustering geotagged data

The basic topic of this study is a graph algorithm that decomposes graphs which we call graph fragmentation algorithms. The aim is to derive fast algorithms to break an input graph into connected subgraphs which we call fragments. The original motivation for the research came from data mining geotagged photographs from Flickr. Plotting these geotags sometimes reveals recognisable patterns of cities, countries or continents made purely of dots. Joining those dots within a limited distance of each other makes a graph many of whose components are cities, attractions, etc. which are densely connected. A question is then: how can one break the graph into fragments in such a way that its component structure is preserved? We give a simple example of a graph fragmentation algorithm. An active vertex v is selected. Next, v selects an active neighbour u, if any, and v absorbs u by making u inactive and orienting the edge uv from u to v. If v has no active neighbours then v points to itself, and becomes an inactive root. The process continues until all vertices become inactive. In the end, the fragments  the objects of interest, are formed by directed paths pointing to the root vertices of the components. Fragmentation algorithms can be varied by altering the selection operation. A major difference between the algorithms is how vertices are selected i.e. probabilistically, deterministically or heuristically. For cycle graphs, we make a formal analysis of the various fragmentation algorithms. We also study a variation in which edges are selected instead of vertex, and extend our analysis to circulant graphs. Many fragmentation algorithms are based on assigning every vertex a unique oriented outedge, in which case the subgraph obtained consists of uncyclic components. This generalises the subgraph formed by the wellknown random mapping graph model to which we draw some comparisons. We next introduce another fragmentation model, the permutation subgraph model. The vertices of the graph are permuted and examined in permutation order. Starting from the beginning of the permutation, each vertex points to its first neighbour to the right of it in the permutation, or to itself if no such neighbour exists. Permutation subgraphs are studied in more detail for a wider class of graph models including rregular graphs, random graphs and infinite random graphs on the integers. Inspired by the interest in triangles in social networks, we also investigate another variation called trianglefragmentation, in which every vertex points to the neighbour with which has the highest number of common neighbours. Although not a linear time algorithm in general, it seems it might be suitable for decomposing dense graphs. The algorithm is analysed experimentally on planted l partition model and random geometric graphs. It is also evaluated as clustering algorithm on a number of realworld graphs including social networks and graphs formed by geotagged photographs taken from Flickr.
