Use this URL to cite or link to this record in EThOS:
Title: Entity type modeling for multi-document summarization : generating descriptive summaries of geo-located entities
Author: Aker, Ahmet
ISNI:       0000 0004 5348 5806
Awarding Body: University of Sheffield
Current Institution: University of Sheffield
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
In this work we investigate the application of entity type models in extractive multi-document summarization using the automatic caption generation for images of geo-located entities (e.g. Westminster Abbey, Loch Ness, Eiffel Tower) as an application scenario. Entity type models contain sets of patterns aiming to capture the ways the geo-located entities are described in natural language. They are automatically derived from texts about geo-located entities of the same type (e.g. churches, lakes, towers). We collect texts about geo-located entities from Wikipedia because our investigation show that the information humans associate with entity types positively correlates with the information contained in Wikipedia articles about the same entity types. We integrate entity type models into a multi-document summarizer and use them to address the two major tasks in extractive multi-document summarization: sentence scoring and summary composition. We experiment with three different representation methods for entity type models: signature words, n-gram language models and dependency patterns. We first propose that entity type models will improve sentence scoring, i.e. they will help to assign higher scores to sentences which are more relevant to the output summary than to those which are not. Secondly, we claim that summary composition can be improved using entity type models. We follow two different approaches to integrate the entity type models into our multi-document summarizer. In the first approach we use the entity type models in combination with existing standard summarization features to score the sentences. We also manually categorize the set of patterns by the information types they describe and use them to reduce redundancy and to produce better flow within the summary. The second approach aims to eliminate the need for manual intervention and to fully automate the process of summary generation. As in the first approach the sentences are scored using standard summarization features and entity type models. However, unlike the first approach we fully automate the process of summary composition by simultaneously addressing the redundancy and flow aspects of the summary. We evaluate the summarizer with integrated entity type models relative to (1) a summarizer using standard text related features commonly used in summarization and (2) the Wikipedia location descriptions. The latter constitute a strong baseline for automated summaries to be evaluated against. The automated summaries are evaluated against human reference summaries using ROUGE and human readability evaluation, as is a common practice in automatic summarization. Our results show that entity type models significantly improve the quality of output summaries over that of summaries generated using standard summarization features andWikipedia baseline summaries. The representation of entity type models using dependency patterns is superior to the representations using signature words and n-gram language models.
Supervisor: Gaizauskas, Robert Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available