Use this URL to cite or link to this record in EThOS:
Title: A compositional vector space model of ellipsis and anaphora
Author: Wijnholds, Gijs Jasper
ISNI:       0000 0004 9355 6883
Awarding Body: Queen Mary University of London
Current Institution: Queen Mary, University of London
Date of Award: 2020
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis discusses research in compositional distributional semantics: if words are defined by their use in language and represented as high-dimensional vectors reflecting their co-occurrence behaviour in textual corpora, how should words be composed to produce a similar numerical representation for sentences, paragraphs and documents? Neural methods learn a task-dependent composition by generalising over large datasets, whereas type-driven approaches stipulate that composition is given by a functional view on words, leaving open the question of what those functions should do, concretely. We take on the type-driven approach to compositional distributional semantics and focus on the categorical framework of Coecke, Grefenstette, and Sadrzadeh [CGS13], which models composition as an interpretation of syntactic structures as linear maps on vector spaces using the language of category theory, as well as the two-step approach of Muskens and Sadrzadeh [MS16], where syntactic structures map to lambda logical forms that are instantiated by a concrete composition model. We develop the theory behind these approaches to cover phenomena not dealt with in previous work, evaluate the models in sentence-level tasks, and implement a tensor learning method that generalises to arbitrary sentences. This thesis reports three main contributions. The first, theoretical in nature, discusses the ability of categorical and lambda-based models of compositional distributional semantics to model ellipsis, anaphora, and parasitic gaps; phenomena that challenge the linearity of previous compositional models. Secondly, we perform an evaluation study on verb phrase ellipsis where we introduce three novel sentence evaluation datasets and compare algebraic, neural, and tensor-based composition models to show that models that resolve ellipsis achieve higher correlation with humans. Finally, we generalise the skipgram model [Mik+13] to a tensor-based setting and implement it for transitive verbs, showing that neural methods to learn tensor representations for words can outperform previous tensor-based methods on compositional tasks.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available