Use this URL to cite or link to this record in EThOS:
Title: Modelling, inference and optimization in probabilistic machine learning
Author: Lu, Xiaoyu
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Bayesian machine learning has gained tremendous attention in the machine learning community over the past few years. Bayesian methods offer a coherent reasoning for quantifying uncertainties in the decision making procedure, based on the Bayes rule. One of the core advantages of Bayesian methods is the separation of modelling and inference. In other words, the likelihood models are completely independent of the computation of the posterior distribution of the parameters. There are many Bayesian models that are widely used in the machine learning community. For example, non-parametric models such as Gaussian Processes and Dirichlet Processes are flexible models which are able to capture and learn the structure of the data. Bayesian deep learning models, which are based on neural networks, are another example of flexible Bayesian models that are rich enough to represent non-linear structures in the data. The process of inferring the posterior lies at the center of Bayesian inference. When computing the posterior distribution exactly is not feasible, due to intractability of the posterior and the computational or memory constraints, approximate Bayesian inference comes to play. In this PhD thesis, I develop and investigate various Bayesian modelling and inference techniques and apply them to multiple interesting domains and tasks. We begin with Tucker Gaussian Processes(TGP), a class of flexible non-parametric models based on Gaussian Processes (GP). We apply the method to 1) regression problems on structured input data, and 2) collaborative filtering problems where TGP offers an elegant way of incorporating side information. We demonstrate superior results compared with benchmarks on a number of examples across different domains. A closely related line of research based on GPs is Bayesian Optimization (BO). It is a black-box optimizer where one optimizes an objective function through subsequent queries about next input locations to be evaluated at. However, this method does not work well when the input space is non-Euclidean or combinatorial. We alleviate the problem by learning a low dimensional Euclidean representation of the combinatorial input space with variational inference, using Variational Auto-encoder (VAE). The optimization can then be conducted on the low dimensional embedding instead. We apply our method to Automatic Statistician and natural scene understanding, which give promising results. For approximate Bayesian inference, we first propose an algorithm called Relativistic Hamiltonian Monte Carlo (RHMC) which is a variant of MCMC. In particular, we replace Newton's kinetic energy in the Hamiltonian with Einstein's relativistic kinetic energy, which makes the algorithm more robust. There are several extensions to RHMC, including a stochastic gradient version for scalability, a thermostat version based on the temperature of the physical system and a resulting optimization algorithm which gives comparable performance compared with the state-of-the-art. Finally, we propose another sampling based inference method called the Adaptive Importance Sampling with Exploration and Exploitation (Daisee), where we look into the problem of exploration-exploitation in adaptive importance sampling through establishing a natural connection between importance sampling and multi-armed bandit problem. In particular, through a finite-time regret analysis we show that the regret of the proposed algorithm grows sublinearly with time. Further, we propose a hierarchical extension of Daisee to encourage exploration in the region with high uncertainty. The new models proposed in this thesis help to allow for more flexible Bayesian modelling and the inference techniques introduced can open new research directions for efficient and accurate posterior inference. These contribute to Bayesian inference and probabilistic machine learning.
Supervisor: Teh, Yee Whye Sponsor: Clarendon PAG Oxford Scholarship
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available