Use this URL to cite or link to this record in EThOS:
Title: Form analysis using colour and context
Author: Wong, Wing Seong
ISNI:       0000 0001 3571 671X
Awarding Body: Nottingham Trent University
Current Institution: Nottingham Trent University
Date of Award: 2001
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Despite the advances in computer technologies, the automatic processing of form documents, especially those that are filled with cursive writing, remains an unresolved problem. In many cases, incoming forms must still be entered manually into the document management system before the data can be electronically processed. This work aims to contribute ideas to improve the form data extraction and recognition processes to realize a more reliable automatic form processing system. It investigates the possibility of using colour to improve the data extraction process and Optical Character Recognition (OCR) to retrieve contextual knowledge to improve the Cursive Script Recognition (CSR). An innovative colour reduction technique is proposed that can successfully reduce the colour content of form documents based on a direct comparison of the pixels' RGB value. Using these quantised forms, the use of colour to aid the extraction of the filled data is then investigated. Three experiments are conducted to assess the effectiveness of such a method. Experimental results show that an extraction system that utilizes colour information will improve the recall rate from 96.5% to 99% and accuracy rate from 97.5% to 99%, with an extraction speed that is up to 3 times faster than a black and white extraction system. The effectiveness of the new extraction method over a black & white technique is reflected in a significant improvement in the CSR rate (up from 49% to 58%) and at the same time as reducing the need for the commonly used text repair algorithms. The novel concept of using OCR to aid CSR by extracting the contextual knowledge has also been demonstrated. OCR generated cues are used to reduce the CSR search space by limiting the lexicon size for a given field. The experimental results show that using current OCR technology, cues can be successfully located 99% of the time resulting in an improvement of the CSR rate by an average 12% (from 43% to 55%). Finally, a further study has been conducted to investigate the feasibility of using the developed methods to process a filled form without using the equivalent blank form image. The experimental results show that although the extraction rate drops from 94.1% to 82.7% when the blank form is not available, most of this decrease is the result of miss-retrieved OCR text rather than the filled-in words. The actual CSR rates reduction only drops by around 1.8%.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Handwriting; Cursive writing