Use this URL to cite or link to this record in EThOS: http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.633277
Title: Generation of referring expressions for an unknown audience
Author: Kutlák, Roman
ISNI:       0000 0004 5365 2686
Awarding Body: University of Aberdeen
Current Institution: University of Aberdeen
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
When computers generate text, they have to consider how to describe the entities mentioned in the text. This situation becomes more difficult when the audience is unknown, as it is not clear what information is available to the addressees. This thesis investigates generation of descriptions in situations when an algorithm does not have a precise model of addressee's knowledge. This thesis starts with the collection and analysis of a corpus of descriptions of famous people. The analysis of the corpus revealed a number of useful patterns, which informed the remainder of this thesis. One of the difficult questions is how to choose information that helps addressees identify the described person. This thesis introduces a corpus-based method for determining which properties are more likely to be known by the addressees, and a probability-based method to identify properties that are distinguishing. One of the patterns observed in the collected corpus is the inclusion of multiple properties each of which uniquely identifies the referent. This thesis introduces a novel corpus-based method for determining how many properties to include in a description. Finally, a number of algorithms that leverage the findings of the corpus analysis and their computational implementation are proposed and tested in an evaluation involving human participants. The proposed algorithms outperformed the Incremental Algorithm in terms of numbers of correctly identified referents and in terms of providing a better mental image of the referent. The main contributions of this thesis are: (1) a corpus-based analysis of descriptions produced for an unknown audience; (2) a computational heuristic for estimating what information is likely to be known to addressees; and (3) algorithms that can generate referring expressions that benefit addressees without having an explicit model of addressee's knowledge.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.633277  DOI: Not available
Keywords: Computer algorithms ; Natural language processing (Computer science)
Share: