Use this URL to cite or link to this record in EThOS:
Title: Enabling feature-level interpretability in non-linear latent variable models : a synthesis of statistical and machine learning techniques
Author: Märtens, Kaspar
ISNI:       0000 0005 0291 3724
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Gaining insights into complex high-dimensional data is challenging and typically requires the use of dimensionality reduction methods. These methods let us identify low-dimensional structures embedded within the data that may reveal patterns of interest. In probabilistic models, such low-dimensional structures are captured via latent variables. In biomedical applications, e.g. in computational biology, it is common to use *linear* dimensionality reduction approaches like probabilistic PCA due to their interpretability. In machine learning, however, recently there has been substantial interest in non-linear "black box" latent variable models because of their improved predictive capabilities. In this thesis, we build upon and propose extensions to two non-linear dimensionality reduction frameworks: the Gaussian Process Latent Variable Model (GPLVM) and the Variational Autoencoder (VAE). The former is based on Gaussian Processes, whereas the latter utilises neural networks. It would be desirable if we could combine some interpretable aspects of linear models with the flexibility of modern non-linear probabilistic methods. The goal of our work is to enable feature-level interpretability within non-linear latent variable models, while also incorporating covariate information. As opposed to post-hoc explainability, we aim to directly construct decomposable non-linear models. By combining ideas from classical statistics and embedding these within latent variable models, we investigate two different perspectives on how to aid feature-level transparency. First, we propose approaches to characterise and quantify the contribution of a particular latent variable (or a covariate) for every feature. We achieve this by building upon the notion of functional ANOVA decompositions from classical statistics, and embedding these decompositions within the GPLVM and the VAE frameworks. Second, we investigate another approach towards improving feature-level interpretability, by providing a mechanism to answer the question which features exhibit similar patterns in the VAE framework. We propose to do this by introducing probabilistic clustering structure as part of the decoder network. We demonstrate the utility of our proposed methods on various toy and high-dimensional genomics data sets.
Supervisor: Holmes, Chris ; Yau, Christopher Sponsor: Engineering and Physical Sciences Research Council ; Medical Research Council
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Statistical Machine Learning