Word sense disambiguation and information retrieval
Starting with a review of previous research that attempted to improve the representation of documents in IR systems, this research is reassessed in the light of word sense ambiguity. It will be shown that a number of the attempts' successes or failures were due to the noticing or ignoring of ambiguity. In the review of disambiguation research, many varied techniques for performing automatic disambiguities are introduced. Research on the disambiguating abilities of people is presented also. It has been found that people are inconsistent when asked to disambiguate words and this causes problems when testing the output of an automatic disambiguator. The first of two sets of experiments to investigate the relationship between ambiguity, disambiguation, and IR, involves a technique where ambiguity and disambiguation can be simulated in a document collection. The results of these experiments lead to the conclusions that query size plays an important role in the relationship between ambiguity and IR. Retrievals based on very small queries suffer particularly from ambiguity and benefit most from disambiguation. Other queries, however, contain a sufficient number of words to provide a form of context that implicitly resolves the query word's ambiguities. In general, ambiguity is found to be not as great a problem to IR systems as might have been thought and the errors made by a disambiguator can be more of a problem than the ambiguity it is trying to resolve. In the complementary second set of experiments, a disambiguator is built and tested, it is applied to a document test collection, and an IR system is adjusted to accommodate the sense information in the collection. The conclusions of these experiments are found to broadly confirm those of the previous set.