Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.814237
Title: An intelligent decision-making scheme in a dynamic multi-objective environment using deep reinforcement learning
Author: Hasan, Md Mahmudul
ISNI:       0000 0004 9353 1037
Awarding Body: Anglia Ruskin University
Current Institution: Anglia Ruskin University
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
Real-life problems are dynamic and associated with a decision-making process with multiple options. We need to do optimisation to solve some of these dynamic decision-making problems. These problems are challenging to solve when we need trade-off between multiple parameters in a decision-making process, especially in a dynamic environment. However, with the help of artificial intelligence (AI), we may solve these problems effectively. This research aims to investigate the development of an intelligent decision-making scheme for a dynamic multi-objective environment using deep reinforcement learning (DRL) algorithm. This includes developing a benchmark in the area of dynamic multi-objective optimisation in reinforcement learning (RL) settings, which stimulated the development of an improved testbed using the conventional deep-sea treasure (DST) benchmark. The proposed testbed is created based on changing the optimal Pareto front (PF) and Pareto set (PS). To the best of my knowledge, this is the first dynamic multi-objective testbed for RL settings. Moreover, a framework is proposed to handle multi-objective in a dynamic environment that fundamentally maintains an equilibrium between different objectives to provide a compromised solution that is closed to the true PF. To proof the concept, the proposed model has been implemented in a real-world scenario to predict the vulnerable zones based on the water quality resilience in São Paulo, Brazil. The proposed algorithm namely parity-Q deep Q network (PQDQN) is successfully implemented and tested where the agent outperforms in terms of achieving the goal (i.e. obtained rewards). Though, the agent requires higher elapsed time (i.e. the number of steps) to be trained compared to the multi-objective Monte Carlo tree search (MO-MCTS) agent in a particular event, its accuracy in finding the Pareto optimum solutions is significantly enhanced compared to the multi-policy DQN (MPDQN) and multi-Pareto Q learning (MPQ) algorithms. The outcome reveals that the proposed algorithm can find the optimum solution in a dynamic environment. It allows a new objective to accommodate without any retraining and behaviour tuning of the agent. It also governs the policy that needs to be selected. As far as the dynamic DST testbed is concerned, it will provide the researchers with a new dimension to conduct their research and enable them to test their algorithms in solving problems that are dynamic in nature.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.814237  DOI: Not available
Share: