Artigo Acesso aberto

Term weighting based on document revision history

2011; Wiley; Volume: 62; Issue: 12 Linguagem: Inglês

10.1002/asi.21597

ISSN

1532-2890

Autores

Sérgio Nunes, Cristina Ribeiro, Gabriel David,

Tópico(s)

Information Retrieval and Search Behavior

Resumo

Journal of the American Society for Information Science and TechnologyVolume 62, Issue 12 p. 2471-2478 Research Article Term weighting based on document revision history Sérgio Nunes, Sérgio Nunes [email protected] INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n. 4200-465 Porto, PortugalSearch for more papers by this authorCristina Ribeiro, Cristina Ribeiro [email protected] INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n. 4200-465 Porto, PortugalSearch for more papers by this authorGabriel David, Gabriel David [email protected] INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n. 4200-465 Porto, PortugalSearch for more papers by this author Sérgio Nunes, Sérgio Nunes [email protected] INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n. 4200-465 Porto, PortugalSearch for more papers by this authorCristina Ribeiro, Cristina Ribeiro [email protected] INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n. 4200-465 Porto, PortugalSearch for more papers by this authorGabriel David, Gabriel David [email protected] INESC Porto, DEI, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s/n. 4200-465 Porto, PortugalSearch for more papers by this author First published: 20 September 2011 https://doi.org/10.1002/asi.21597Citations: 6Read the full textAboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Abstract In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone. References Aji, A., Wang, Y., Agichtein, E., & Gabrilovich, E. (2010). Using the past to score the present: Extending term weighting models through revision history analysis. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (ACM CIKM'10) (pp. 629– 638). New York: ACM Press. Efron, M. (2010). Linear time series models for term weighting in information retrieval. Journal of the American Society for Information Science and Technology, 61(7), 1299– 1312. Elsas, J.L., & Dumais, S.T. (2010). Leveraging temporal dynamics of document content in relevance ranking. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (ACM WSDM'10) (pp. 1– 10). New York: ACM Press. Howe, J. (2006). The rise of crowdsourcing. Wired, 14(6). Jones, K.S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11– 21. Keen, E.M. (1992). Term position ranking: Some new test results. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR'92). New York: ACM Press. Kittur, A., Chi, E.H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (ACM CHI'08) (pp. 453– 456). New York: ACM Press. Luhn, H.P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159– 165. Nunes, S. (2007). Exploring temporal evidence in web information retrieval. BCS IRSG Symposium Future Directions in Information Access (FDIA'07) (pp. 44– 50). Cambridge, England: BCS IRSG. Robertson, S.E., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR'94) (pp. 232– 241). New York: Springer-Verlag. Robertson, S.E., Zaragoza, H., & Taylor, M. (2004). Simple BM25 extension to multiple weighted fields. In Proceedings of the 13th ACM International Conference on Information and Knowledge Management (ACM CIKM'04) (pp. 42– 49). New York: ACM Press. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513– 523. Singhal, A. (2001). Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24(4), 35– 42. Troy, A.D., & Zhang, G.Q. (2007). Enhancing relevance scoring with chronological term rank. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR'07) (pp. 599– 606). New York: ACM Press. Wikipedia. (n.d.). Manual of style. Retrievedfrom http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style. Zubiaga, A. (2009, August). Enhancing navigation on Wikipedia with social tags. Paper presented at Wikimania 2009, Buenos Aires, Argentina. Retrieved from http://wikimania2009.wikimedia.org/wiki/Proceedings:104. Citing Literature Volume62, Issue12December 2011Pages 2471-2478 ReferencesRelatedInformation

Referência(s)