Use this URL to cite or link to this record in EThOS:
Title: Corpus-consulting probabilistic approach to parsing : the CCPX parser and its complementary components
Author: Day, Michael David
ISNI:       0000 0004 2750 6466
Awarding Body: Cardiff University
Current Institution: Cardiff University
Date of Award: 2007
Availability of Full Text:
Access from EThOS:
Access from Institution:
Corpus linguistics is now a major field in the study of language. In recent years corpora that are syntactically analysed have become available to researchers, and these clearly have great potential for use in the field of parsing natural language. This thesis describes a project that exploits this possibility. It makes four distinct contributions to these two fields. The first is an updated version of a corpus that is (a) analysed in terms of the rich syntax of Systemic Functional Grammar (SFG), and (b) annotated using the extensible Mark-up Language (XML). The second contribution is a native XML corpus database, and the third is a sophisticated corpus query tool for accessing it. The fourth contribution is a new type of parser that is both corpus-consulting and probabilistic. It draws its knowledge of syntactic probabilities from the corpus database, and it stores its working data within the database, so that it is strongly database-oriented. SFG has been widely used in natural language generation for approaching two decades, but it has been used far less frequently in parsing (the first stage in natural language understanding). Previous SFG corpus-based parsers have utilised traditional parsing algorithms, but they have experienced problems of efficiency and coverage, due to (a) the richness of the syntax and (b) the challenge of parsing unrestricted spoken and written texts. The present research overcomes these problems by introducing a new type of parsing algorithm that is 'semi-deterministic' (as human readers are), and utilises its knowledge of the rules—including probabilities—of English syntax. A language, however, is constantly evolving. New words and uses are added, while others become less frequent and drop out altogether. The new parsing system seeks to replicate this. As new sentences are parsed they are added to the corpus, and this slowly changes the frequencies of the words and the syntactic patterns. The corpus is in this sense dynamic, and so simulates a human's changing knowledge of words and syntax.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available