Use this URL to cite or link to this record in EThOS:
Title: Practical challenges of learning and representation for large graphs
Author: Chamberlain, Benjamin Paul
ISNI:       0000 0004 7655 6507
Awarding Body: University of London
Current Institution: Imperial College London
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
An ever-increasing amount of the humanity's information is being stored in large graphs. The world wide web, digital social networks, e-commerce platforms and chat networks now contain digital traces of the majority of living humans. Many of the most valuable companies ever created are dedicated to organising, managing and extracting useful information from large digital graphs. Machine learning has been shown to be an important tool for automating this task. Discovering scalable machine learning systems, to extract useful information from graphs, is a problem of great practical significance. Interesting graphs, such as the web graph, often contain more information than can be stored on a single computer, and so working with the raw data presents considerable challenges. Graph representations are often employed that encapsulate key properties of the underlying data and enable certain tasks to be performed efficiently, at the expense of others. Representations are chosen to balance time complexity, space complexity and predictive performance on a downstream task, such as labelling vertices with attributes. We are concerned with problems of extracting and inferring information from large graphs and applying the results in deployed commercial systems. The research revolves around two large-scale machine learning projects (1) A system for searching and organising data from social media graphs (2) A system to profile customers through their interactions with products on an e-commerce platform. We use representations of graphs to allow algorithms to be run faster, cheaper and more accurately. Doing so allows us to satisfy systems constraints that could not be achieved by operating directly on the raw data. We demonstrate how careful choices of representation can be used to improve machine learning performance on several real-world tasks. We do this under challenging industrial constraints such as real-time serving, runtime costs or maintainability.
Supervisor: Deisenroth, Marc ; Faisal, Aldo Sponsor: Royal Commission for the Exhibition of 1851
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral