Use this URL to cite or link to this record in EThOS:
Title: Semantic chunking
Author: Muszynska, Ewa
ISNI:       0000 0004 9353 8618
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Long sentences pose a challenge for natural language processing (NLP) applications. They are associated with a complex information structure leading to increased requirements for processing resources. Although the issue is present in many areas of research, there is little uniformity in the solutions used by research communities dedicated to individual NLP applications. Different aspects of the problem are addressed by different tasks, such as sentence simplification or shallow chunking. The main contribution of this thesis is the introduction of the task of semantic chunking as a general approach to reducing the cost of processing long sentences. The goal of semantic chunking is to find semantically contained fragments of a sentence representation that can be processed independently and recombined without loss of information. We anchor its principles in established concepts of semantic theory, in particular event and situation semantics. Most of the experiments in this thesis focus on semantic chunking defined on complex semantic representations in Dependency Minimal Recursion Semantics (DMRS), but we also demonstrate that the task can be performed on sentence strings. We present three chunking models: a) rule-based proof-of-concept DMRS chunking system; b) a semi-supervised sequence labelling neural model for surface semantic chunking; c) a system capable of finding semantic chunk boundaries based on the inherent structure of DMRS graphs, generalisable in the form of descriptive templates. We show how semantic chunking can be applied within a divide-and-conquer processing paradigm, using as an example the task of realization from DMRS. The application of semantic chunking yields noticeable efficiency gains without decreasing the quality of results.
Supervisor: Copestake, Ann Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
Keywords: NLP ; semantic chunking