Use this URL to cite or link to this record in EThOS:
Title: Minimally-supervised methods for Arabic Named Entity Recognition
Author: Althobaiti, Maha
ISNI:       0000 0004 5921 7298
Awarding Body: University of Essex
Current Institution: University of Essex
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Named Entity Recognition (NER) has attracted much attention over the past twenty years, as a main task of Information Extraction. The current dominant techniques for addressing NER are supervised methods that can achieve high performance, but require new manually annotated data for every new domain and/or genre change. Our work focuses on approaches that make it possible to tackle new domains with minimal human intervention to identify Named Entities (NEs) in Arabic text. Specifically, we investigate two minimally-supervised methods: semi-supervised learning and distant learning. Our semi-supervised algorithm for identifying NEs does not require annotated training data or gazetteers. It only requires, for each NE type, a seed list of a few instances to initiate the learning process. Novel aspects of our algorithm include (i) a new way to produce and generalise the extraction patterns (ii) a new filtering criterion to remove noisy patterns (iii) a comparison of two ranking measures for determining the most reliable candidate NEs. Next, we present our methodology to exploit Wikipedia structure to automatically develop an Arabic NE annotated corpus. A novel mechanism is introduced, based on the high coverage of Wikipedia, in order to address two challenges particular to tagging NEs in Arabic text: rich morphology and the absence of capitalisation. Neither technique has yet achieved performance levels comparable to those of supervised methods. Semi-supervised algorithms tend to have high precision but comparatively low recall, whereas distant learning tends to achieve higher recall but lower precision. Therefore, we present a novel approach to Arabic NER using a combination of semi-supervised and distant learning techniques. We used a variety of classifier combination schemes, including the Bayesian Classifier Combination (BCC) procedure, recently proposed for sentiment analysis. According to our results, the BCC model leads to an increase in performance of 8 percentage points over the best minimally-supervised classifier.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA75 Electronic computers. Computer science