Use this URL to cite or link to this record in EThOS:
Title: Using lexical chains to characterise scientific text
Author: Hollingsworth, W. A.
Awarding Body: University of Cambridge
Current Institution: University of Cambridge
Date of Award: 2008
Availability of Full Text:
Full text unavailable from EThOS.
Please contact the current institution’s library for further details.
My aim is to develop a computerised system that gives a (sight-impaired or sighted) user the ability to skim a scientific paper. Specifically, I use lexical chains (Morris and Hirst, 1991) to represent topics in text. I adapt existing lexical chain algorithms to scientific text, as opposed to news text. My hypothesis is that terms in lexical chains for scientific text require the inclusion of adjectives. For example, the term rhetorical structure is more characteristic of a paper about discourse structure than the word structure alone. However, not all adjectives have this property of making terms more characteristic. For example, the phrase different experiment is no more descriptive than the word experiment. I will present an algorithm for automatically distinguishing between characteristic adjectives (e.g., rhetorical) and non-characteristic adjectives (e.g., different). In my target application (text skimming) the user is presented with lexical chains directly. A good lexical chain by my evaluation is one that is similar to a lexical chain created by a human. For this reason I collect a corpus of human-generated lexical chains. This gold standard contains 230 lexical chains created by 13 annotators. No ready-made metrics for computing lexical chain similarity exist. Therefore I will test and choose appropriate similarity measures for this task. To calibrate them, I formalise my intuition of similarity between lexical chains by creating a gold standard of similarity judgements. My data show that humans often include adjectives in lexical chains and that lexical chains created using characteristic adjectives are more similar to human-generated lexical chains than lexical chains that contain only nouns.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available