Use this URL to cite or link to this record in EThOS:
Title: Learning behaviours in multi-agent systems
Author: Conroy, Ross
Awarding Body: Teesside University
Current Institution: Teesside University
Date of Award: 2017
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
Interactive Dynamic Influence Diagrams (I-DIDs) is a well recognised graphical decision model that explicitly models the behaviour of a subject agent whilst also explicitly modelling the behaviour of other agents. This allows the expected behaviour of other agents to influence the subject agent allowing the agent to maximise its utility not only from its belief about the state-space but also its belief about the behaviour of other agents. I-DIDs require models to be manually constructed ahead of time, this can present a barrier to entry towards applying I-DIDs to some applications such as when interacting with a human agent, due to it being difficult, if not impossible, to construct models for human agents. This is due to the fact that each human agent that may be encountered may have their own unique thought process which may change over time, also human agents may not always execute an optimal policy, taking shortcuts in their thought process. I-DIDs also become difficult to construct for large and complex domains where constructing I-DIDs and associated models of other agents may become too time consuming to be considered reasonable. I-DIDs also experience the problem of exponential growth within their candidate models of other agents. The more candidate models there are, the more computational resources to solve are required. This can cause I-DIDs to become impossible to solve for complex domains or larger time horizons. An example of such a domain which exhibits these properties is that of real-time strategy (RTS) games such as StarCraft. StarCraft has been growing in popularity in the field of articial intelligence (AI) research thanks to the abundance of historical data freely available from on-line sources, as well as the ability to be software controlled allowing for the development of intelligent agents to control the game. StarCraft is a complex game requiring players to account for many aspects such as combat, resource gathering and scouting whilst also constrained by the partially observable nature of the game. Initially this work provides a software framework towards learning the behaviour of agents within an I-DID. This framework allows for different behaviour learning techniques to be implemented such as those from data sources and candidate models. The framework also allows the implementation of multiple model reduction techniques such as the already established Behavioural Equivalence (BE) and Action Equivalence (AE) techniques along with the new Value Equivalent (VE) and its approximation. The goal of this framework is to simplify the process of implementing different learning and reduction techniques towards the I-DID framework. This framework is used throughout this thesis to implement proposed learning and model reduction methods. This thesis aims to tackle the problem of constructing models of human players by learning the behaviour of other agents by way of automatic learning techniques, then applying this learning to the behaviour of human players from StarCraft replay files. The learning also allows for learning incomplete behaviour of opposing agents using a Behavioural Compatibility Test (BCT) to complete partial behaviour from existing learnt behaviour where compatible. This has been evaluated for performance and solution quality in two problem domains, one simulated (Tiger Problem) and the real world (StarCraft). The problem of exponential growth within the model space of I-DIDs is also tackled, building on the ideas of BE and AE with a new concept of VE. VE aims to reduce the model space further than BE and AE by not only reducing models based on their matching behaviour, but also based on expected utility for the subject agent given the expected behaviour. The expected utility for each behaviour is learnt from past interaction data and behaviours grouped where expected utility is the same or similar. To apply VE to problems where utility data may not be available, this work also proposes a new framework for determining VE approximately based on behavioural coverage. Provided with this framework are a set of techniques towards determining a subset of candidate models given a larger set. The goal of this behaviour reduction is to reduce the model space within I-DIDs whilst still maintaining sufficient behavioural coverage to maintain reasonable solutions with reduced computational costs.
Supervisor: Zeng, Yifeng Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available