Use this URL to cite or link to this record in EThOS:
Title: Compositional distributional semantics with compact closed categories and Frobenius algebras
Author: Kartsaklis, Dimitrios
ISNI:       0000 0004 5353 3222
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2014
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Restricted access.
Access from Institution:
The provision of compositionality in distributional models of meaning, where a word is represented as a vector of co-occurrence counts with every other word in the vocabulary, offers a solution to the fact that no text corpus, regardless of its size, is capable of providing reliable co-occurrence statistics for anything but very short text constituents. The purpose of a compositional distributional model is to provide a function that composes the vectors for the words within a sentence, in order to create a vectorial representation that re ects its meaning. Using the abstract mathematical framework of category theory, Coecke, Sadrzadeh and Clark showed that this function can directly depend on the grammatical structure of the sentence, providing an elegant mathematical counterpart of the formal semantics view. The framework is general and compositional but stays abstract to a large extent. This thesis contributes to ongoing research related to the above categorical model in three ways: Firstly, I propose a concrete instantiation of the abstract framework based on Frobenius algebras (joint work with Sadrzadeh). The theory improves shortcomings of previous proposals, extends the coverage of the language, and is supported by experimental work that improves existing results. The proposed framework describes a new class of compositional models thatfind intuitive interpretations for a number of linguistic phenomena. Secondly, I propose and evaluate in practice a new compositional methodology which explicitly deals with the different levels of lexical ambiguity (joint work with Pulman). A concrete algorithm is presented, based on the separation of vector disambiguation from composition in an explicit prior step. Extensive experimental work shows that the proposed methodology indeed results in more accurate composite representations for the framework of Coecke et al. in particular and every other class of compositional models in general. As a last contribution, I formalize the explicit treatment of lexical ambiguity in the context of the categorical framework by resorting to categorical quantum mechanics (joint work with Coecke). In the proposed extension, the concept of a distributional vector is replaced with that of a density matrix, which compactly represents a probability distribution over the potential different meanings of the specific word. Composition takes the form of quantum measurements, leading to interesting analogies between quantum physics and linguistics.
Supervisor: Sadrzadeh, Mehrnoosh; Coecke, Bob; Pulman, Stephen Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Computational Linguistics ; Natural Language Processing ; Artificial Intelligence ; Computer science (mathematics) ; Quantum theory (mathematics) ; Computing ; Quantum information processing ; Physics and CS ; category theory ; categorical quantum mechanics ; compositionality ; distributional models ; semantics ; vector space models ; mathematics of language