Theories of information and uncertainty for the modelling of information retrieval : an application of situation theory and Dempster-Schafer's theory of evidence.
Current information retrieval models only offer simplistic and specific representations of information.
Therefore, there is a need for the development of a new formalism able to model information
retrieval systems in a more generic manner. In 1986, Van Rijsbergen suggested that such formalisms
can be both appropriately and powerfully defined within a logic. The resulting formalism should
capture information as it appears in an information retrieval system, and also in any of its inherent
forms. The aim of this thesis is to understand the nature of information in information retrieval,
and to propose a logic-based model of an information retrieval system that reflects this nature.
The first objective of this thesis is to identify essential features of information in an information
retrieval system. These are:
0 significance, and
It is shown that the first four features are qualitative, whereas the last two are quantitative, and that
their modelling requires different frameworks: a theory of information, and a theory of uncertainty,
The second objective of this thesis is to determine the appropriate framework for each type of
feature, and to develop a method to combine them in a consistent fashion. The combination is
based on the Transformation Principle.
Many specific attempts have been made to derive an adequate definition of information. The one
adopted in this thesis is based on that of Dretske, Barwise, and Devlin who claimed that there
is a primitive notion of information in terms of which a logic can be defined, and subsequently
developed a theory of information, namely Situation Theory. Their approach was in accordance
with Van Rijsbergen' s suggestion of a logic-based formalism for modelling an information retrieval
system. This thesis shows that Situation Theory is best at representing all the qualitative features.
Regarding the modelling of the quantitative features of information, this thesis shows that the
framework that models them best is the Dempster-Shafer Theory of Evidence, together with the
notion of refinement, later introduced by Shafer.
The third objective of this thesis is to develop a model of an information retrieval system based on
Situation Theory and the Dempster-Shafer Theory of Evidence. This is done in two steps. First,
the unstructured model is defined in which the structure and the significance of information are
not accounted for. Second, the unstructured model is extended into the structured model, which incorporates the structure and the significance of information. This strategy is adopted because it
enables the careful representation of the flow of information to be performed first.
The final objective of the thesis is to implement the model and to perform empirical evaluation
to assess its validity. The unstructured and the structured models are implemented based on an
existing on-line thesaurus, known as WordNet. The experiments performed to evaluate the two
models use the National Physical Laboratory standard test collection.
The experimental performance obtained was poor, because it was difficult to extract the flow of
information from the document set. This was mainly due to the data used in the experimentation
which was inappropriate for the test collection. However, this thesis shows that if more appropriate
data, for example, indexing tools and thesauri, were available, better performances would be
The conclusion of this work was that Situation Theory, combined with the Dempster-Shafer Theory
of Evidence, allows the appropriate and powerful representation of several essential features
of information in an information retrieval system. Although its implementation presents some
difficulties, the model is the first of its kind to capture, in a general manner, these features within
a uniform framework. As a result, it can be easily generalized to many types of information
retrieval systems (e.g., interactive, multimedia systems), or many aspects of the retrieval process
(e.g., user modelling).