Extracting pragmatic content from Email
This research presents results concerning the large scale automatic extraction of pragmatic content from Email, by a system based on a phrase matching approach to Speech Act detection combined with the empirical detection of Speech Act patterns in corpora. The results show that most Speech Acts that occur in such a corpus can be recognized by the approach. This investigation is supported by the analysis of a corpus consisting of 1000 Emails. We describe experimental work to sort a substantial sample of Emails based on their function, which is to say, whether they contain a statement of fact, a request for the recipient to do something, or ask a question. This could be highly desirable functionality for the overburdened Email user, especially if combined with other, more traditional, measures of content relevance and filters based on desirable and undesirable mail sources. We have attempted to apply an lE engine to the extraction of message content located in the message, in part by the use of speech-act detection criteria, e. g. for what it is to be a request for action, under the many possible surface forms that can be used to express that in English, so as to locate the action requested as well as the fact it is a request. The work may have potential practical uses, but here we describe it as the challenge of adapting an IE engine to a somewhat different, task: that of message function detection. The major contributions are: Defining Request Speech Act types. The Request Speech Act is one of the most important functions of an utterance to be recognised, in order to find out the gist of a message. The present work has concentrated on three sub-types of Requests: Requests for Information, Action, and Permission. An algorithm to recognise Speech Acts Patterns found frequently in a domain, together with linguistic rules, make it possible to recognise most of the examples of Requests in the corpus. The results of the evaluation of the system are encouraging and suggest that, in order to avoid long-response time systems, a fast and friendly system is the right approach to implement.