Use this URL to cite or link to this record in EThOS:
Title: Approximating true relevance model in relevance feedback
Author: Zhang, Peng
Awarding Body: Robert Gordon University
Current Institution: Robert Gordon University
Date of Award: 2013
Availability of Full Text:
Access from EThOS:
Access from Institution:
Relevance is an essential concept in information retrieval (IR) and relevance estimation is a fundamental IR task. It involves not only document relevance estimation, but also estimation of user's information need. Relevance-based language model aims to estimate a relevance model (i.e., a relevant query term distribution) from relevance feedback documents. The true relevance model should be generated from truly relevant documents. The ideal estimation of the true relevance model is expected to be not only effective in terms of mean retrieval performance (e.g., Mean Average Precision) over all the queries, but also stable in the sense that the performance is stable across different individual queries. In practice, however, in approximating/estimating the true relevance model, the improvement of retrieval effectiveness often sacrifices the retrieval stability, and vice versa. In this thesis, we propose to explore and analyze such effectiveness-stability tradeoff from a new perspective, i.e., the bias-variance tradeoff that is a fundamental theory in statistical estimation. We first formulate the bias, variance and the trade-off between them for retrieval performance as well as for query model estimation. We then analytically and empirically study a number of factors (e.g., query model complexity, query model combination, document weight smoothness and irrelevant documents removal) that can affect the bias and variance. Our study shows that the proposed bias-variance trade-off analysis can serve as an analytical framework for query model estimation. We then investigate in depth on two particular key factors: document weight smoothness and removal of irrelevant documents, in query model estimation, by proposing novel methods for document weight smoothing and irrelevance distribution separation, respectively. Systematic experimental evaluation on TREC collections shows that the proposed methods can improve both retrieval effectiveness and retrieval stability of query model estimation. In addition to the above main contributions, we also carry out initial exploration on two further directions: the formulation of bias-variance in personalization and looking at the query model estimation via a novel theoretical angle (i.e., Quantum theory) that has partially inspired our research.
Supervisor: Song, Dawei; McCall, John Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available
Keywords: Relevance feedback ; True relevance model ; Bias-variance analysis ; Document weight smoothing ; Distribution separation method ; Personalization ; Quantum