Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.769694
Title: Towards better data efficiency in deep reinforcement learning
Author: Dilokthanakul, Nat
ISNI:       0000 0004 7659 0131
Awarding Body: Imperial College London
Current Institution: Imperial College London
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Deep Reinforcement Learning (DRL) is a machine learning paradigm which uses deep neural networks as one of its main components to search for reward-directed behaviours. Although DRL has been successful in many high-dimensional and difficult tasks, there are several remaining challenges in bridging the gap between human-level learning ability and DRL. One of its weaknesses is the data-hungry nature which makes it impractical in real-world scenarios. In this thesis, three main causes of data inefficiency in DRL are explored: (i) the sparse reward problem, (ii) the exploration problem and (iii) the representation problem. Towards solving these problems, a suite of proposed algorithms and models are studied: (i) The first proposed method is a hierarchical model with two types of intrinsic motivations: feature-control and pixel-control. The models with these intrinsic motivations have been evaluated to be effective in sparse reward tasks. An empirical study has also suggested that the successes in the sparse reward problem come from extra training signals that originate from the intrinsic rewards. (ii) Next, an exploration strategy based on the optimism in the face of uncertainty (OFU) principle is proposed. In this method, the uncertainty of interest is the uncertainty on the return, which is relatively easy to measure. Here, experiments have shown that the method works well in Montezuma's Revenge, a notoriously difficult exploration game. In addition, weaknesses of the method such as potential sub-optimal behaviours in a stochastic environment is also discussed. (iii) Deep neural networks are known to exhibit the forgetting problem during learning, which demonstrates its inefficiency as a representational model. This study aims at understanding the relationship between neural network architectures and their forgetting behaviours which leads to poor generalisability and data inefficiency. It has been found that specific weight sharing structures can be used to moderately alleviate the forgetting problem. (iv) In order to move towards more generalisable representations in DRL, disentangled representation learning models present themselves as a promising candidate. A deep generative model, namely GMVAE, that represents data with both discrete and continuous variables has been proposed as a potential method to achieve generalisable representation. A study of the model in a digit dataset has revealed that it successfully learns interpretable categorical grouping and meaningful continuous variables. Major problems associated with the training of such a model are also discussed. (v) Additionally, a framework for adding inductive biases in a generative model is proposed. This framework has been shown to create latent variable models that are able to disentangle local and global information in image datasets. This framework provides an additional method for creating a latent variable model with explicit information placement in the latent variables. Finally, the thesis is concluded with reviews over related works in the field and suggests future directions that could help refine a solution to the data-inefficiency problem in DRL.
Supervisor: Shanahan, Murray ; Deisenroth, Marc Sponsor: Royal Thai Scholarship
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.769694  DOI:
Share: