An intelligent spelling error correction system based on the results of an analysis which has established a set of phonological and sequential rules obeyed by misspellings
This thesis describes the analysis of over 1300 spelling and typing errors. It introduces and describes many empirical rules which these errors obey and shows that a vast majority of errors are variations on some 3000 basic forms. It also describes and tests an intelligent, knowledge based spelling error correction algorithm based on the above work. Using the Shorter Oxford English dictionary it correctly identifies over 90% of typical spelling errors and over 80% of all spelling errors, where the correct word is in the dictionary. The methodology used is as follows: An error form is compared with each word in that small portion of the dictionary likely to contain the intended word, but examination of improbable words is rapidly abandoned using heuristic rules. Any differences between the dictionary word and the error form are compared with the basic forms. Any dictionary word which differs from the error form only by one or two basic forms is transferred to a separate list. The program then acts as an expert system where each of the basic forms is a production or rule with a subjective Bayesian probability. A choice is made from the list by calculating the Bayesian probability for each word in the separate list. An interactive spelling error corrector using the concepts and methods developed here is operating on the Bradford University Cyber 170/720 Computer, and was used to correct this thesis. The corrector also runs on VAX and Prime computers.