Use this URL to cite or link to this record in EThOS:
Title: Automated knowledge extraction from text
Author: Bowden, Paul Richard
ISNI:       0000 0004 2751 5944
Awarding Body: Nottingham Trent University
Current Institution: Nottingham Trent University
Date of Award: 1999
Availability of Full Text:
Access from EThOS:
Access from Institution:
Knowledge Extraction (KE) is the automated extraction of facts from machine-readable text. KE is a branch of Natural Language Processing (NLP). Within NLP, processing techniques may be deep or shallow. Deep techniques are the traditional methods of NLP and computational linguistics, and are aimed at language understanding. They are mostly domain independent techniques. Shallow techniques are currently a focus of interest and may be defined as methods which achieve NLP goals without recourse to attempts to understand fully the input text. These are mostly domain specific techniques. Deep processing approaches are considered with respect to the problems they entail. These problems can be both theoretical and practical. These and other difficulties are used to justify shallow attempts at NLP tasks. After a review of several existing KE and similar systems this work describes the knowledge extraction program developed by the author (KEP). KEP aims to be shallow and non domain specific, and extracts factual knowledge from explanatory texts. A pattern-matching approach is used which cuts fact-bearing sentences into fragments so that concepts and the facts relating to them can be extracted. Various conceptual relations are searched for, including at present definitions (definitions of concepts), hypernyms (parent classes of concepts), exemplifications (examples of concepts) and partitions (lists of the component parts of a concept). One of the motivating factors for doing this research was the desire to answer the question: how useful can a specific set of shallow techniques be in a non domain specific NLP application? This is an important question at a time when shallow techniques are viewed favourably by the NLP community. To this end, the performance of KEP has been evaluated using the recall and precision measures. As a final demonstration of the program's abilities, KEP has also been run on a large part of the text from this work to produce a first-cut glossary for that text. This glossary successfully captures the main concepts from the text and provides useful explanations of them in many cases. It is concluded that KEP is a working program which demonstrates the usefulness of shallow, non domain specific methods, and which has opened up the possibilities of several new research directions, including automatic index creation, student assignment marking, and information retrieval from the Internet for the automatic construction of semantic-net knowledge bases.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Information extraction; Summarisation; Indexing