ABSTRACT
The use of sampling, randomized algorithms, or training based on the unpredictable inputs of users in Information Retrieval often leads to non-deterministic outputs. Evaluating the effectiveness of systems incorporating these methods can be challenging since each run may produce different effectiveness scores. Current IR evaluation techniques do not address this problem. Using the context of distributed information retrieval as a case study for our investigation, we propose a solution based on multivariate linear modeling. We show that the approach provides a consistent and reliable method to compare the effectiveness of non-deterministic IR algorithms, and explain how statistics can safely be used to show that two IR algorithms have equivalent effectiveness.
- R. H. Baayen, D. J. Davidson, and D. M. Bates. Mixed-effects modeling with crossed random effects for subjects and items. Journal of memory and language, 59(4):390--412, 2008.Google ScholarCross Ref
- B. Carterette, E. Kanoulas, and E. Yilmaz. Simulating simple user for system effectiveness evaluation. In CIKM, pages 611--620, 2011. Google ScholarDigital Library
- A. Kulkarni and J. Callan. Document allocation policies for selective searching of distributed indexes. In CIKM, pages 449--458, 2010. Google ScholarDigital Library
- D. Metzler and W. B. Croft. A markov random field model for term dependencies. In SIGIR, pages 472--479, 2005. Google ScholarDigital Library
- S. E. Robertson and E. Kanoulas. On per-topic variance in IR evaluation. In SIGIR, pages 891--900, 2012. Google ScholarDigital Library
- L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR, pages 298--305, 2003. Google ScholarDigital Library
Index Terms
- Evaluating non-deterministic retrieval systems
Recommendations
The effectiveness of query-specific hierarchic clustering in information retrieval
Hierarchic document clustering has been widely applied to information retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search (IFS). However, previous research has been inconclusive as to whether clustering does ...
Multiple testing in statistical analysis of systems-based information retrieval experiments
High-quality reusable test collections and formal statistical hypothesis testing together support a rigorous experimental environment for information retrieval research. But as Armstrong et al. [2009b] recently argued, global analysis of experiments ...
University of Alicante at WiQA 2006
Evaluation of Multilingual and Multi-modal Information RetrievalThis paper presents the participation of University of Alicante at the WiQA pilot task organized as part of the CLEF 2006 campaign. For a given set of topics, this task presupposes the discovery of important novel information distributed across ...
Comments