Title:

Learning curves for Gaussian process regression on random graphs

Gaussian processes are a nonparametric method that can be used to learn both regression and classification rules from examples for arbitrary input spaces using the ’kernel trick’. They are well understood for inputs from Euclidean spaces, however, much less research has focused on other spaces. In this thesis I aim to at least partially resolve this. In particular I focus on the case where inputs are defined on the vertices of a graph and the task is to learn a function defined on the vertices from noisy examples, i.e. a regression problem. A challenging problem in the area of nonparametric learning is to predict the generalisation error as a function of the number of examples or learning curve. I show that, unlike in the Euclidean case where predictions are either quantitatively accurate for a few specific cases or only qualitatively accurate for a broader range of situations, I am able to derive accurate learning curves for Gaussian processes on graphs for a wide range of input spaces given by ensembles of random graphs. I focus on the random walk kernel but my results generalise to any kernel that can be written as a truncated sum of powers of the normalised graph Laplacian. I begin first with a discussion of the properties of the random walk kernel, which can be viewed as an approximation of the ubiquitous squared exponential kernel in continuous spaces. I show that compared to the squared exponential kernel, the random walk kernel has some surprising properties which includes a nontrivial limiting form for some types of graphs. After investigating the limiting form of the kernel I then study its use as a prior. I propose a solution to this in the form of a local normalisation, where the prior scale at each vertex is normalised locally as desired. To drive home the point about kernel normalisation I then examine the differences between the two kernels when they are used as a Gaussian process prior over functions defined on the vertices of a graph. I show using numerical simulations that the locally normalised kernel leads to a probabilistically more plausible Gaussian process prior. After investigating the properties of the random walk kernel I then discuss the learning curves of a Gaussian process with a random walk kernel for both kernel normalisations in a matched scenario (where student and teacher are both Gaussian processes with matching hyperparameters). I show that by using the cavity method I can derive accurate predictions along the whole length of the learning curve that dramatically improves upon previously derived approximations for continuous spaces suitably extended to the discrete graph case. The derivation of the learning curve for the locally normalised kernel required an additional approximation in the resulting cavity equations. I subsequently, therefore, investigate this approximation in more detail using the replica method. I show that the locally normalised kernel leads to a highly nontrivial replica calculation, that eventually shows that the approximation used in the cavity analysis amounts to ignoring some consistency requirements between incoming cavity distributions. I focus in particular on a teacher distribution that is given by a Gaussian process with a random walk kernel but different hyperparameters. I show that in this case, by applying the cavity method, I am able once more to calculate accurate predictions of the learning curve. The resulting equations resemble the matched case over an inflated number of variables. To finish this thesis I examine the learning curves for varying degrees of model mismatch.
