Use this URL to cite or link to this record in EThOS:
Title: Cross-document coreference between different types of collateral texts for films
Author: Tomadaki, Eleftheria
ISNI:       0000 0001 3534 539X
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2006
Availability of Full Text:
Access from EThOS:
Access from Institution:
Recent systems merge information from texts describing video content for video annotation by employing cross-document coreference techniques, mostly realised between the same text genres or in texts including restricted sets of events. We introduce a new, interesting and challenging scenario - film and the variety of collateral text genres narrating its content, including unrestricted sets of events. In particular, cross-document coreference between plot summaries and audio description is challenging, as these two texts differ significantly. The resulting cross-referencing can potentially enrich video annotation. We address the questions of how plot summaries and audio description refer to events depicted in films, whether the same events are expressed by lexical regularities in both texts and how solutions to the cross-document coreference task can be extended to deal with different text genres and unconstrained sets of events. This thesis introduces a new research domain for information extraction and cross-document coreference, reports on a corpus based analysis of the language used in plot summaries and audio description focusing on how events are expressed, proposes and evaluates solutions to the cross-document coreference task for an unconstrained set of events in different text types and provides two data sets for information extraction related research. We make three claims. First, plot summaries and audio description use lexical regularities, such as frequent open class words occurring more frequently than in general language, to describe film content. Second, these two texts use similar terms in referring to entities, but different terms in referring to events, i.e. different frequent, verbs. Frequent plot summary events are referred to by a very few lexical regularities in audio description. Third, the task of cross-document coreference between plot summary and audio description can be automated achieving at least 50% Precision and 33% Recall, by matching nouns, functional roles and some verbs, and taking into account the event temporal aspect. The Recall may be improved mostly by resolving all references to entities, while the Precision may be increased when treating a restricted set of events.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available