Use this URL to cite or link to this record in EThOS: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.782913
Title: Neural word representations for biomedical NLP
Author: Chiu, Hon Wing
ISNI:       0000 0004 7968 5144
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2019
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Abstract:
Word representations are mathematical objects which capture the semantic and syntactic properties of words in a way that is interpretable by machines. Recently, the encoding of word properties into a low-dimensional vector space using neural networks has become popular. Neural representations are now used as the main input to Natural Language Processing (NLP)applications and in most areas of NLP, achieving cutting-edge results. Our work extends the usefulness of neural representations, with a particular emphasis on the biomedical domain which is linguistically highly challenging. We focus on three directions: first, we present a comprehensive study on how the quality of the representation model varies according to its training parameters. For this, we implement a set of well-established models with different training settings regarding the size of input corpora, model architectures and hyper-parameters, and evaluate them thoroughly using the standard methods. Our best model significantly outperforms the baseline one, demonstrating the high impact of training parameters and the necessity of their optimization. The study provides an important reference for researchers using neural representations for biomedical NLP. Second, we introduce two novel datasets for evaluating noun and verb representations in biomedicine. These datasets are designed to be consistent with those available for mainstream NLP. They enable, for the first time, evaluation of verb representations in the domain. Last, we propose a neural approach to facilitate the development of a VerbNet-Style classification in biomedicine: we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, developed explicitly for verb optimization, to expand that classification with new members. Evaluation of the resulting resource shows promising results when representation learning is performed using verb-related contexts. Additionally, our human- and task-based evaluations reveal that the automatically-created resource is highly accurate, suggesting that our method can be used to facilitate cost-effective development of verb resources in biomedicine.
Supervisor: Korhonen, Anna Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID: uk.bl.ethos.782913  DOI:
Keywords: word embedding ; biomedical natural language processing ; biomedical verb intrinsic evaluation dataset
Share: