Use this URL to cite or link to this record in EThOS:
Title: Using data-driven resources for optimising rule-based syntactic analysis for modern standard Arabic
Author: Elbey, Mohamed
ISNI:       0000 0004 6494 393X
Awarding Body: University of Manchester
Current Institution: University of Manchester
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis is about optimising a rule based parser for Modern Standard Arabic (MSA). If ambiguity is a major problem in NLP systems, it is even worse in a language MSA due to the fact that written MSA omits short vowels and for other reasons that will be discussed in Chapter 1. By analysing the original rule based parser, it turned out that many parses were unnecessary due to many edges being produced and not used in the final analysis. The first part of this thesis is to investigate whether integrating a Part Of Speech (POS) tagger will help speeding up the parsing, or not. This is a well-known technique for Romance and Germanic languages, but its effectiveness has not been widely explored for MSA. The second part of the thesis is to use statistics and machine learning techniques and investigate its effects on the parser. This thesis is not about the accuracy of the parser. It is about finding ways to improve the speed. A new approach will be discussed, which was not explored in statistical parsing before. This approach is collecting statistics while parsing, and using these to learn strategies to be used during the parsing process. The learning process involves all the moves of the parsing (moves that lead to the final analysis, i.e good moves and moves that lead away from it, i.e. bad moves). The idea here is, not only we are learning from positive data, but also from negative data. The questions to be asked: • Why is this move good so that we can encourage itl • Why is this move bad so that we discourage it. In the final part of the thesis, both techniques were merged together: integrating a POS tagger and using the learning approach, and finding out the effect of this on the parser.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: NLP ; ARABIC