Use this URL to cite or link to this record in EThOS:
Title: Decision trees and forests : a probabilistic perspective
Author: Lakshminarayanan, B.
ISNI:       0000 0004 8504 0780
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Decision trees and ensembles of decision trees are very popular in machine learning and often achieve state-of-the-art performance on black-box prediction tasks. However, popular variants such as C4.5, CART, boosted trees and random forests lack a probabilistic interpretation since they usually just specify an algorithm for training a model. We take a probabilistic approach where we cast the decision tree structures and the parameters associated with the nodes of a decision tree as a probabilistic model; given labeled examples, we can train the probabilistic model using a variety of approaches (Bayesian learning, maximum likelihood, etc). The probabilistic approach allows us to encode prior assumptions about tree structures and share statistical strength between node parameters; furthermore, it offers a principled mechanism to obtain probabilistic predictions which is crucial for applications where uncertainty quantification is important. Existing work on Bayesian decision trees relies on Markov chain Monte Carlo which can be computationally slow and suffer from poor mixing. We propose a novel sequential Monte Carlo algorithm that computes a particle approximation to the posterior over trees in a top-down fashion. We also propose a novel sampler for Bayesian additive regression trees by combining the above top-down particle filtering algorithm with the Particle Gibbs (Andrieu et al., 2010) framework. Finally, we propose Mondrian forests (MFs), a computationally efficient hybrid solution that is competitive with non-probabilistic counterparts in terms of speed and accuracy, but additionally produces well-calibrated uncertainty estimates. MFs use the Mondrian process (Roy and Teh, 2009) as the randomization mechanism and hierarchically smooth the node parameters within each tree (using a hierarchical probabilistic model and approximate Bayesian updates), but combine the trees in a non-Bayesian fashion. MFs can be grown in an incremental/online fashion and remarkably, the distribution of online MFs is the same as that of batch MFs.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available