Use this URL to cite or link to this record in EThOS:
Title: Natural language techniques for error correction
Author: Bowden, T. G.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 1997
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
Dealing with human errors such as spelling or grammar mistakes is a necessary part of natural language processing. The aim of this project was to investigate how far error detection and correction could proceed when the system purview was set a sub-sentential stretch of text. This restriction comes from cooperative error handling: detecting/correcting errors just after user entry, as the user is entering further text. Short context, or shallow, processing is also interesting because it is potentially cheaper and faster than a full-scale parse and because sentential constraints become less reliable when the 'sentence' is ill-formed. There has been no previous report on the effectiveness of local syntactic constraints on general (English) ill-formedness. Additionally all error processing programmes, other than some working in very restricted domains, have been post-processors rather than cooperative. Being post-processors, previous programs have been concerned with errors left undetected, after some degree of proofreading. Cooperative processing is also aimed at the errors people spend time backtracking to catch. In the absence of existent suitable data, a corpus of keystrokes made by subjects entering a piece of text was collated; errors were classified as caught or uncaught and various interesting analyses emerged. For context-less processing, a method based on morphological error rules and another on binary positional trigrams were devised and compared. Then to incorporate context, local syntactic constraints based on tag information were implemented, using bigram and triggram co-occurrence checks with a Markov tagging procedure. The tag-based constraints were compared with a partial parsing method. These error handlers were evaluated on data from the Keystroke Corpus and on other data manufactured and collected. The morphological error rules and tag-based checks using very short context were the most promising. As far as current comparison allows, there being a scarcity of reported results in this area, the short context techniques implemented here compared well against full-parsing error handlers. Ideas outlined for future work include a method for further identifying detected word scope errors and a practical, usable cooperative corrector based on an extension of an existing commercial application.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available