Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.702380
Title: Reinforcement learning for trading dialogue agents in non-cooperative negotiations
Author: Efstathiou, Ioannis
ISNI:       0000 0004 6057 5666
Awarding Body: Heriot-Watt University
Current Institution: Heriot-Watt University
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Recent advances in automating Dialogue Management have been mainly made in cooperative environments -where the dialogue system tries to help a human to meet their goals. In non-cooperative environments though, such as competitive trading, there is still much work to be done. The complexity of such an environment rises as there is usually imperfect information about the interlocutors’ goals and states. The thesis shows that non-cooperative dialogue agents are capable of learning how to successfully negotiate in a variety of trading-game settings, using Reinforcement Learning, and results are presented from testing the trained dialogue policies with humans. The agents learned when and how to manipulate using dialogue, how to judge the decisions of their rivals, how much information they should expose, as well as how to effectively map the adversarial needs in order to predict and exploit their actions. Initially the environment was a two-player trading game (“Taikun”). The agent learned how to use explicit linguistic manipulation, even with risks of exposure (detection) where severe penalties apply. A more complex opponent model for adversaries was also implemented, where we modelled all trading dialogue moves as implicitly manipulating the adversary’s opponent model, and we worked in a more complex game (“Catan”). In that multi-agent environment we show that agents can learn to be legitimately persuasive or deceitful. Agents which learned how to manipulate opponents using dialogue are more successful than ones which do not manipulate. We also demonstrate that trading dialogues are more successful when the learning agent builds an estimate of the adversarial hidden goals and preferences. Furthermore the thesis shows that policies trained in bilateral negotiations can be very effective in multilateral ones (i.e. the 4-player version of Catan). The findings suggest that it is possible to train non-cooperative dialogue agents which successfully trade using linguistic manipulation. Such non-cooperative agents may have important future applications, such as on automated debating, police investigation, games, and education.
Supervisor: Lemon, Oliver ; Corne, David Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.702380  DOI: Not available
Share: