Use this URL to cite or link to this record in EThOS:
Title: Neural generation of textual summaries from knowledge base triples
Author: Vougiouklis, Pavlos
ISNI:       0000 0004 7656 6342
Awarding Body: University of Southampton
Current Institution: University of Southampton
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this thesis, we investigate the problem of generating natural language summaries for structured data encoded as triples using neural networks. We propose an end-to-end trainable architecture that encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on this encoded vector. In order to both train and evaluate the performance of our approach, we explore different methodologies for building the required data-to-text corpora. We initially focus our attention on the generation of biographies. Using methods for both automatic and human evaluation, we demonstrated that our technique is capable of scaling to domains with challenging vocabulary sizes of over 400k words. Given the promising results of our approach in biographies, we explore its applicability in the generation of open-domain Wikipedia summaries in two under-resourced languages, Arabic and Esperanto. We propose an adaptation of our original encoder-decoder architecture that outperforms a set of strong baselines of different nature. Furthermore, we conducted a set of community studies in order to measure the usability of the generated content by Wikipedia readers and editors. The targeted communities ranked our generated text close to the expected standards of Wikipedia. In addition, we found that the editors are likely to reuse a large portion of the generated summaries, thus, emphasizing the usefulness of our approach to the involved communities. Finally, we extend the original model with a pointer mechanism that enables it to jointly learn to verbalise in a different number of ways the content from the triples while retaining the ability to generate regular words from a fixed target vocabulary. We evaluate performance with a dataset encompassing the entirety of English Wikipedia. Results from both automatic and human evaluation highlight the superiority of the latter approach compared to our original encoder-decoder architecture and a set of competitive baselines.
Supervisor: Simperl, Elena Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available