Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.788210
Title: Learning to make decisions with unforeseen possibilities
Author: Innes, Craig
ISNI:       0000 0004 8497 5576
Awarding Body: University of Edinburgh
Current Institution: University of Edinburgh
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
Methods for learning optimal policies often assume that the way the domain is conceptualised- the possible states and relevant actions that are needed to solve one's decision problem-is known in advance and does not change during learning. This is an unrealistic assumption in many scenarios. Often, new evidence can reveal important information about what is possible, not just what is likely, or unlikely. A learner may have been completely unaware such possibilities even existed prior to learning. This thesis presents a model of an agent which discovers and exploits unforeseen possibilities from two sources of evidence: domain exploration and communication with an expert. The model combines probabilistic and symbolic reasoning to estimate all components of the decision problem, including the set of belief variables, the possible actions, and the probabilistic dependencies between variables. Unlike prior work on solving decision problems by discovering and learning to exploit unforeseen possibilities (e.g., Rong (2016); McCallum and Ballard (1996)), our model supports discovering and learning to exploit unforeseen factors, as opposed to an additional atomic state. Becoming aware of an unforeseen factor presents computational challenges when compared with becoming aware of an additional atomic state, because even a boolean factor doubles the size of the decision problem's hypothesis space as opposed to increasing it by just one more state. We show via experiments that one can meet those challenges by adopting (defeasible) reasoning principles that are familiar from the literature on belief revision: roughly, default to simple models over more complex ones and default to conserving what you've learned from prior evidence. For one-step decision problems, our agent learns the components of a Decision Network; for sequential problems, it learns a Factored Markov Decision Process. We prove convergence theorems for our models, given the learner's and expert's strategies for gathering evidence. Furthermore, our experiments show that the agent converges on optimal behaviour even when it starts out completely unaware of factors that are critical to success.
Supervisor: Lascarides, Alex Sponsor: Engineering and Physical Sciences Research Council (EPSRC)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.788210  DOI: Not available
Keywords: artificial intelligence ; informed expert ; domain exploration ; expert communication ; probabilistic and symbolic reasoning ; Factored Markov Decision Process
Share: