Use this URL to cite or link to this record in EThOS:
Title: Lexical measurements for information retrieval : a quantum approach
Author: Huertas-Rosero, Alvaro Francisco
ISNI:       0000 0004 2705 8149
Awarding Body: University of Glasgow
Current Institution: University of Glasgow
Date of Award: 2011
Availability of Full Text:
Access from EThOS:
Access from Institution:
The problem of determining whether a document is about a loosely defined topic is at the core of text Information Retrieval (IR). An automatic IR system should be able to determine if a document is likely to convey information on a topic. In most cases, it has to do it solely based on measure- ments of the use of terms in the document (lexical measurements). In this work a novel scheme for measuring and representing lexical information from text documents is proposed. This scheme is inspired by the concept of ideal measurement as is described by Quantum Theory (QT). We apply it to Information Retrieval through formal analogies between text processing and physical measurements. The main contribution of this work is the development of a complete mathematical scheme to describe lexical measurements. These measurements encompass current ways of repre- senting text, but also completely new representation schemes for it. For example, this quantum-like representation includes logical features such as non-Boolean behaviour that has been suggested to be a fundamental issue when extracting information from natural language text. This scheme also provides a formal unification of logical, probabilistic and geometric approaches to the IR problem. From the concepts and structures in this scheme of lexical measurement, and using the principle of uncertain conditional, an “Aboutness Witness” is defined as a transformation that can detect docu- ments that are relevant to a query. Mathematical properties of the Aboutness Witness are described in detail and related to other concepts from Information Retrieval. A practical application of this concept is also developed for ad hoc retrieval tasks, and is evaluated with standard collections. Even though the introduction of the model instantiated here does not lead to substantial perfor- mance improvements, it is shown how it can be extended and improved, as well as how it can generate a whole range of radically new models and methodologies. This work opens a number of research possibilities both theoretical and experimental, like new representations for documents in Hilbert spaces or other forms, methodologies for term weighting to be used either within the proposed framework or independently, ways to extend existing methodologies, and a new range of operator-based methods for several tasks in IR.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: QA75 Electronic computers. Computer science ; Z665 Library Science. Information Science