Use this URL to cite or link to this record in EThOS:
Title: Named Entity Recognition : a local grammar-based approach
Author: Traboulsi, Hayssam N.
ISNI:       0000 0001 3535 8895
Awarding Body: University of Surrey
Current Institution: University of Surrey
Date of Award: 2006
Availability of Full Text:
Access from EThOS:
Access from Institution:
In general, the task of Named Entity Recognition (NER) is an information extraction subtask which seeks to identify and classify proper nouns in a document as being a person, organisation, place, date, time, monetary value, or percentage. In this thesis the recognition of the most open types of named entities - person and organisation names - is investigated. This task has proved to be significant to information retrieval, machine translation, document indexing, and a necessary prerequisite to more complex information extraction and question-answering tasks. Two of the most difficult problems encountered by the developers of NER systems are those of portability and system performance: a practical NER system is expected to have the ability to correctly recognise named entities in new domains of texts or new languages at a minimal cost. The main contributor to such problems is the manual effort that has always been needed to develop symbolic or statistical recognition rules for these systems from large tagged text corpora. In this research we introduce a prototype called LG-Finder to automatically acquire linguistic recognition rules (local grammars) for person and organisation names from untagged text corpora through the use of techniques in corpus linguistics, including frequency, collocation and concordance analyses. So far, LG-Finder has been successfully tested on English news texts, but it can be applied straightforwardly to other European languages. In addition, we present a local grammar-based NER prototype (NExtract) which incorporates finite state transducers implementing local grammars acquired by LG-Finder. The success rates scored by NExtract when evaluated against data sets from Reuters and Wall Street Journal are promising and comparable with those achieved in the DARPA-sponsored MUC-7 named entity evaluation. Finally, we present a question-answering prototype, which makes use of the local grammars to answer a small set of questions seeking information on people and organisations in the financial domain. The evaluation results of this prototype are encouraging and motivate further investigations using the local grammar approach in this domain.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available