Temporal information in newswire articles : an annotation scheme and corpus study
Many natural language processing applications, such as information extraction, question answering, topic detection and tracking, would benefit significantly from the ability to accurately position reported events in time, either relatively with respect to other events or absolutely with respect to calendrical time. However, relatively little work has been done to date on the automatic extraction of temporal information from text. Before we can progress to automatically position reported events in time, we must gain an understanding of the mechanisms used to do this in language. This understanding can be promoted through the development of all annotation scheme, which allows us to identify the textual expressions conveying events, times and temporal relations in a corpus of 'real' text. This thesis describes a fine-grained annotation scheme with which we can capture all events, times and temporal relations reported ill a text. To aid the application of the scheme to text, a graphical annotation tool has been developed. This tool not only allows easy markup of sophisticated temporal annotations, it also contains an interactive, inference-based component supporting the gathering of temporal relations. The annotation scheme and the tool have been evaluated through the construction of a trial corpus during a pilot study. In this study, a group of annotators was supplied with a description of the annotation scheme and asked to apply it to a trial corpus. The pilot study showed that the annotation scheme was difficult to apply, but is feasible with improvements to the definition of the annotation scheme and the tool. Analysis of the resulting trial corpus also provides preliminary results on the relative extent to which different linguistic mechanisms, explicit and implicit, are used to convey temporal relational information in text.