Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639711
Title: Sample-based search methods for Bayes-adaptive planning
Author: Guez, A. G.
ISNI:       0000 0004 5365 0373
Awarding Body: UCL (University College London)
Current Institution: University College London (University of London)
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
A fundamental issue for control is acting in the face of uncertainty about the environment. Amongst other things, this induces a trade-off between exploration and exploitation. A model-based Bayesian agent optimizes its return by maintaining a posterior distribution over possible environments, and considering all possible future paths. This optimization is equivalent to solving a Markov Decision Process (MDP) whose hyperstate comprises the agent's beliefs about the environment, as well as its current state in that environment. This corresponding process is called a Bayes-Adaptive MDP (BAMDP). Even for MDPs with only a few states, it is generally intractable to solve the corresponding BAMDP exactly. Various heuristics have been devised, but those that are computationally tractable often perform indifferently, whereas those that perform well are typically so expensive as to be applicable only in small domains with limited structure. Here, we develop new tractable methods for planning in BAMDPs based on recent advances in the solution to large MDPs and general partially observable MDPs. Our algorithms are sample-based, plan online in a way that is focused on the current belief, and, critically, avoid expensive belief updates during simulations. In discrete domains, we use Monte-Carlo tree search to search forward in an aggressive manner. The derived algorithm can scale to large MDPs and provably converges to the Bayes-optimal solution asymptotically. We then consider a more general class of simulation-based methods in which approximation methods can be employed to allow value function estimates to generalize between hyperstates during search. This allows us to tackle continuous domains. We validate our approach empirically in standard domains by comparison with existing approximations. Finally, we explore Bayes-adaptive planning in environments that are modelled by rich, non-parametric probabilistic models. We demonstrate that a fully Bayesian agent can be advantageous in the exploration of complex and even infinite, structured domains.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.639711  DOI: Not available
Share: