Use this URL to cite or link to this record in EThOS:
Title: A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics
Author: Milajevs, Dmitrijs
ISNI:       0000 0004 7653 6223
Awarding Body: Queen Mary University of London
Current Institution: Queen Mary, University of London
Date of Award: 2018
Availability of Full Text:
Access from EThOS:
Access from Institution:
Representation of sentences that captures semantics is an essential part of natural language processing systems, such as information retrieval or machine translation. The representation of a sentence is commonly built by combining the representations of the words that the sentence consists of. Similarity between words is widely used as a proxy to evaluate semantic representations. Word similarity models are well-studied and are shown to positively correlate with human similarity judgements. Current evaluation of models of sentential similarity builds on the results obtained in lexical experiments. The main focus is how the lexical representations are used, rather than what they should be. It is often assumed that the optimal representations for word similarity are also optimal for sentence similarity. This work discards this assumption and systematically looks for lexical representations that are optimal for similarity measurement between sentences. We find that the best representation for word similarity is not always the best for sentence similarity and vice versa. The best models in word similarity tasks perform best with additive composition. However, the best result on compositional tasks is achieved with Kroneckerbased composition. There are representations that are equally good in both tasks when used with multiplicative composition. The systematic study of the parameters of similarity models reveals that the more information lexical representations contain, the more attention should be paid to noise. In particular, the word vectors in models with the feature size at the magnitude of the vocabulary size should be sparse, but if a small number of context features is used then the vectors should be dense. Given the right lexical representations, compositional operators achieve state-of-the-art performance, improving over models that use neural-word embeddings. To avoid overfitting, either several test datasets should be used or parameter selection should be based on parameters' average behaviours.
Supervisor: Not available Sponsor: EPSRC
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Semantics ; Language and linguistics ; language processing systems ; Word similarity models ; word similarity measurement ; Kronecker based composition ; computational linguistics