Use this URL to cite or link to this record in EThOS:
Title: On the use of text classification methods for text summarisation
Author: Garcia Constantino, Matias
Awarding Body: University of Liverpool
Current Institution: University of Liverpool
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Access from Institution:
This thesis describes research work undertaken in the fields of text and questionnaire mining. More specifically, the research work is directed at the use of text classification techniques for the purpose of summarising the free text part of questionnaires. In this thesis text summarisation is conceived of as a form of text classification in that the classes assigned to text documents can be viewed as an indication (summarisation) of the main ideas of the original free text but in a coherent and reduced form. The reason for considering this type of summary is because summarising unstructured free text, such as that found in questionnaires, is not deemed to be effective using conventional text summarisation techniques. Four approaches are described in the context of the classification summarisation of free text from different sources, focused on the free text part of questionnaires. The first approach considers the use of standard classification techniques for text summarisation and was motivated by the desire to establish a benchmark with which the more specialised summarisation classification techniques presented later in this thesis could be compared. The second approach, called Classifier Generation Using Secondary Data (CGUSD), addresses the case when the available data is not considered sufficient for training purposes (or possibly because no data is available at all). The third approach, called Semi-Automated Rule Summarisation Extraction Tool (SARSET), presents a semi-automated classification technique to support document summarisation classification in which there is more involvement by the domain experts in the classifier generation process, the idea was that this might serve to produce more effective summaries. The fourth is a hierarchical summarisation classification approach which assumes that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. For evaluation purposes three types of text were considered: (i) questionnaire free text, (ii) text from medical abstracts and (iii) text from news stories.
Supervisor: Coenen, Frans; Radford, Alan; Noble, P.-J. Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA75 Electronic computers. Computer science