Syllable-based morphology for natural language processing.
This thesis addresses the problem of accounting for morphological alternation within
Natural Language Processing. It proposes an approach to morphology which is based
on phonological concepts, in particular the syllable, in contrast to morpheme-based approaches
which have standardly been used by both NLP and linguistics.
It is argued that morpheme-based approaches, within both linguistics and NLP, grew
out of the apparently purely affixational morphology of European languages, and especially
English, but are less appropriate for non-affixational languages such as Arabic. Indeed,
it is claimed that even accounts of those European languages miss important linguistic
generalizations by ignoring more phonologically based alternations, such as umlaut in
German and ablaut in English. To justify this approach, we present a wide range of data
from languages as diverse as German and Rotuman.
A formal language, MOLUSe, is described, which allows for the definition of declarative
mappings between syllable-sequences, and accounts of non-trivial fragments of the
inflectional morphology of English, Arabic and Sanskrit are presented, to demonstrate the
capabilities of the language.
A semantics for the language is defined, and the implementation of an interpreter is
described. The thesis discusses theoretical (linguistic) issues, as well as implementational
issues involved in the incorporation of MOLUSC into a larger lexicon system.
The approach is contrasted with previous work in computational morphology, in particular
finite-state morphology, and its relation to other work in the fields of morphology
and phonology is also discussed.