A syntax analyser and case-marker generator for selected speech acts in Arabic.
This thesis describes the design and
implementation of a syntax analyser and case-marker
generator for selected speech acts in Arabic, named
Rameses II. It is intended as a contribution to the
field of natural language processing (NLP).
The original motivation for this research was
the fact that in one form of the Arabic writing
system there are no diacritics. Diacritics are small
marks placed above or below the main line of
characters. It is hypothesised that I iterate users of
Arabic supply these diacritics when the input text
lacks them. A particularly important subset of
diacritics are those associated with the final
character of a word, which are called case-markers.
It is these, in association with other grammatical
information, that indicate the grammatical category
of case. Thus, these case-markers are used in Arabic
to determine the semantic roles of words in a
sentence. It is the purpose of the project described
in this thesis to model computationally the process
whereby these case-markers are assigned.
The Rameses II system is implemented in Prolog.
It parses a small but substantial portion of Arabic syntax,
namely twelve of the nineteen classes of
act. Arabic sentences are traditionally
into declaratives (which are sentences that
accept a true or false evaluation) and speech acts
(which do not). Because there are already in
existence substantial morphological analysers for
Arabic, Rameses II assumes an input that has already
been analysed morphologically. Thus its main roles
are ( 1 ) to parse this input string and ( 2 ) to
generate case-markers. Such a generator will be a
necessary component in future holistic systems for
Speech acts are a significant and well-defined
area of Arabic grammar, and many aspects of the
treatment suggested here could readily be extended to
other parts of the language.
As its underlying linguistic model of how Arabic
grammar works, the system uses systemic grammar. This
is a semantically motivated model of language which,
as far as I can discover, has not so far been used
for the description of Arabic. However, it has been
widely used for English and many other languages, and
has a rapidly growing use in NLP.
This thesis therefore makes a contribution
the linguistic description of Arabic, as wei I as
the field of NLP. The main body of the thesis is concerned with
( 1 ) current work in the field of natural language
processing, (2) the Arabic language, and (3) an indepth
discussion of the implemented system, including
its architecture, operation, and development. I t
concludes with a brief evaluation and suggestions for
Thus, this research has successfully applied a
new linguistic model to Arabic, resulting in the
first automated system for the syntactic analysis of
speech acts and the generation of case-markers for