Artigo Acesso aberto

Modelling efficient novelty-based search result diversification in metric spaces

2012; Elsevier BV; Volume: 18; Linguagem: Inglês

10.1016/j.jda.2012.07.004

ISSN

1570-8675

Autores

Verónica Gil-Costa, Rodrygo L. T. Santos, Craig Macdonald, Iadh Ounis,

Tópico(s)

Information Retrieval and Search Behavior

Resumo

Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the document's relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.

Referência(s)