Use this URL to cite or link to this record in EThOS:
Title: Generation and application of semantic networks from plain text and Wikipedia
Author: Wojtinnek, Pia-Ramona
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2012
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Natural Language Processing systems crucially depend on the availability of lexical and conceptual knowledge representations. They need to be able to disambiguate word senses and detect synonyms. In order to draw inferences, they require access to hierarchical relations between concepts (dog isAn animal) as well as non-hierarchical ones (gasoline fuels car). Knowledge resources such as lexical databases, semantic networks and ontologies explicitly encode such conceptual knowledge. However, traditionally, these have been manually created, which is expensive and time consuming for large re- sources, and cannot provide adequate coverage in specialised domains. In order to alleviate this acquisition bottleneck, statistical methods have been created to acquire lexical and conceptual knowledge automatically from text. In particular, unsupervised techniques have the advantage that they can be easily adapted to any domain, given some corpus on the topic. However, due to sparseness issues, they often require very large corpora to achieve high quality results. The spectrum of resources and statistical methods has a crucial gap in situations when manually cre- ated resources do not provide the necessary coverage and only limited corpora are available. This is the case for real-world domain applications such as an NLP system for processing technical information based on a limited amount of company documentation. We provide a large-scale demonstration that this gap can be filled through the use of automatically generated networks. The corpus is automatically transformed into a network representing the terms or concepts which occur in the text and their relations, based entirely on linguistic tools. The net- works structurally lie in between the unstructured corpus and the highly structured manually created resources. We show that they can be useful in situations for which neither existing approach is ap- plicable. In contrast to manually created resources, our networks can be generated quickly and on demand. Conversely, they make it possible to achieve higher quality representations from less text than corpus-based methods, relieving the requirement of very large scale corpora. We devise scaleable frameworks for building networks from plain text and Wikipedia with varying levels of expressiveness. This work creates concrete networks from the entire British National Corpus covering 1.2m terms and 21m relations and a Wikipedia network covering 2.7m concepts. We develop a network-based semantic space model and evaluate it on the task of measuring semantic relatedness. In addition, noun compound paraphrasing is tackled to demonstrate the quality of the indirect paths in the network for concept relation description. On both evaluations we achieve results competitive to the state of the art. In particular, our network-based methods outperform corpus-based methods, demonstrating the gain created by leveraging the network structure.
Supervisor: Pulman, Stephen Sponsor: Engineering and Physical Sciences Research Council ; Department of Computer Science, University of Oxford
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Computing ; Applications and algorithms ; natural lanuage processing ; semantic networks