Use this URL to cite or link to this record in EThOS:
Title: Grammatical error prediction
Author: Anderson, O. E.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2010
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
In this thesis, we investigate methods for automatic detection, and to some extent correction, of grammatical errors. The evaluation is based on manual error annotation in the Cambridge Learner Corpus (CLC), and automatic or semi-automatic annotation of error corpora is one possible application, but the methods are also applicable in other settings, for instance to give learners feedback on their writing or in a proofreading tool used to prepare texts for publication. Apart from the CLC, we use the British National Corpus (BNC) to get a better model of correct usage, WordNet for semantic relations, other machine-readable dictionaries for orthography/morphology, and the Robust Accurate Statistical Parsing (RASP) system to parse both the CLC and the BNC and thereby identify syntactic relations within the sentence. Different techniques are investigated, including: sentence-level binary classification based on machine-learning over n-grams of words, n-grams of part-of-speech tags and grammatical relations; automatic identification of features which are highly indicative of individual errors; and development of classifiers aimed more specifically at given error types, for instance concord errors based on syntactic structure and collocation errors based on co-occurrence statistics from BNC, using clustering to deal with data sparseness. We show that such techniques, when applied, can detect, and sometimes even correct, at least certain error types as well as or better than human annotators. We finally present an annotation experiment in which a human annotator corrects and supplements the automatic annotation, which confirms the high detection/correction accuracy of our system and furthermore shows that such a hybrid set-up gives higher-quality annotation with considerably less time and effort expended compared to fully manual annotation.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available