Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.793150
Title: Analysing and correcting dyslexic Arabic texts
Author: Alamri, Maha
Awarding Body: Bangor University
Current Institution: Bangor University
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Access from Institution:
Abstract:
Dyslexia is a disorder that involves difficult with literacy skills and language related skills. It is related to the inability of a person to master the utilisation of written language and affects a significant number of people. This thesis describes the development of the Bangor Dyslexia Arabic Corpus (BDAC) in order to facilitate the analysis and automatic correction of dyslexic Arabic text. This thesis has also developed a new classification of errors made in Arabic by people with dyslexia which was used in the annotation of the BDAC. The dyslexic error classification scheme for Arabic texts (DECA) comprises a list of dyslexia spelling errors classified into 37 types, and grouped into nine categories. This thesis also investigates a new type of classification - dyslexia text classification - that identifies whether or not a text has been written by a person with dyslexia. The text compression scheme known as prediction by partial matching (PPM) has been applied to the problem of distinguishing dyslexic text from non-dyslexic text. Experimental results show that the F₁ score for PPM-classification was 0.99 and outperformed other classifiers such as Multinomial Naïve Bayes and Support Vector Machiness. A new system called Sahah is also proposed for the automatic detection and correction of dyslexia errors in Arabic text. The system uses a language model based on the PPM text compression scheme in addition to edit operations (omission, addition, substitution and transposition). The correct alternative for each error word is chosen on the basis of the compression codelength. Two experiments were carried out to evaluate the usefulness of the Sahah system. Firstly, its accuracy was evaluated using the BDAC containing errors made by people with dyslexia. Secondly, the results of Sahah were compared with the results obtained when using word processing software and the Farasa tool. The results show that the Sahah system significantly outperforms Microsoft Word, Ayaspell and the Farasa tool with an F1 score of 0.83 for detection and an F₁ score of 0.58 for correction.
Supervisor: Teahan, William Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.793150  DOI: Not available
Share: