Use this URL to cite or link to this record in EThOS:
Title: A fully statistical approach to natural language generation
Author: Xiao, Li
Awarding Body: University of Aberdeen
Current Institution: University of Aberdeen
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
This thesis proposes a novel approach to perform Natural Language Generation (NLG), which aims to generate syntactically correct sentences without any handcrafted rules, which is fully statistical approach. My approach does not need any handcrafted rule; it learns how to generate texts for given data by extracting generative rules from a training corpus. A highlight of the approach is the extract rules are human understandable, that is, after training, what texts the approach generates can be still modified by manually revising the rules. This ability is important for commercialisation, and this thesis will also argue that the deep-learning-based NLG approaches do not have this ability. The overall generation strategy of the approach is based on the idea of reusing the existing words in the training corpus. In its training process, the approach learns what words express what data from statistical analysis. Then, to generate new sentences, the approach replaces the words from a (corpus) sentence with words in other corpus sentences so that the replaced sentence captures the messages of a given data. The thesis will present the details of how the strategy works and discusses why the strategy can produce syntactically correct sentences. This approach is also revised to perform Referring Expression Generation (REG) successfully. It does not only select attributes and values for referring to a target referent, but also performs Linguistic Realisation, generating an actual Noun Phrase. Our evaluations suggest that the attribute selection aspect of the algorithm exceeds classic REG algorithms, while the Noun Phrases generated are as similar to those in a previously developed corpus as were Noun Phrases produced by a new set of human speakers.
Supervisor: van Deemter, Kees ; Lin, Chenghua Sponsor: University of Aberdeen (Elphinstone Scholarship)
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Natural language generation (Computer science) ; Computational linguistics