Title:
|
Ontology-based automatic text classification
|
This research investigates to what extent ontologies can be used to achieve an accurate
classification performance of an automatic text classifier, called the Automatic Classification
Engine (ACE). The task of the classifier is to classify Web pages with respect to
the Dewey Decimal Classification (DOC) and Library of Congress Classification (LCC)
schemes. In particular, this research focuses on how to
1. build a set of ontologies which can provide a mechanism to enable machine reasoning;
2. define the mappings between the ontologies and the two classification schemes;
3. implement an ontology-based classifier.
The design and implementation of the classifier concentrates on developing an ontologybased
classification model. Given a Web page, the classifier applies the model to carry out
reasoning to determine terms - from within the Web page - which represent significant
concepts. The classifier, then, uses the mappings to determine the associated DOC and
LCC classes of the significant concepts, and assigns the DOC and LCC classes to the Web
page.
The research also investigates a number of approaches which can be applied to extend
the coverage of the ontologies used in a semi-automatic way, since manually constructing
ontologies is time consuming. The investigation leads to the design and implementation
of a semi-automatic ontology construction system which can recognise new potential
terms. By using an ontology editor, those new terms can be integrated into their associated
ontologies.
An experiment was conducted to validate the effectiveness of the classification model,
in which the classifier classified a set of collections of Web pages. The performance of
the classifier was measured, in terms of its coverage and accuracy. The experimental
evidence shows that the ontology-based automatic text classification approach achieved
a better level of performance over the existing approaches.
|