Computation of term/document discrimination values by use of the cover coefficient concept
1987; Wiley; Volume: 38; Issue: 3 Linguagem: Inglês
10.1002/(sici)1097-4571(198705)38
ISSN1097-4571
Autores Tópico(s)Data Management and Algorithms
ResumoJournal of the American Society for Information ScienceVolume 38, Issue 3 p. 171-183 Research Computation of term/document discrimination values by use of the cover coefficient concept Fazli Can, Corresponding Author Fazli Can Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, TurkeyMiami University; Oxford, OH 45056Search for more papers by this authorEsen A. Ozkarahan, Esen A. Ozkarahan Department of Computer Science, Arizona State University, Tempe, AZ 85287Search for more papers by this author Fazli Can, Corresponding Author Fazli Can Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, TurkeyMiami University; Oxford, OH 45056Search for more papers by this authorEsen A. Ozkarahan, Esen A. Ozkarahan Department of Computer Science, Arizona State University, Tempe, AZ 85287Search for more papers by this author First published: May 1987 https://doi.org/10.1002/(SICI)1097-4571(198705)38:3 3.0.CO;2-SCitations: 7AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onEmailFacebookTwitterLinkedInRedditWechat Abstract Indexing in information retrieval (IR) is used to obtain a suitable vocabulary of index terms and optimum assignment of these terms to documents for increasing the effectiveness and efficiency of an IR system. The concept of term discrimination value (TDV) is one of the criteria used for index-term selection. In this article a new concept called the cover coefficient (CC) will be used in computing TDVs. After a brief introduction to the theory of indexing and the CC concept, an efficient way of computing TDVs by use of the CC concept, index-term selection, and weight modification are discussed. It is also shown that the computational cost of the CC approach in the calculation of TDVs is favorably comparable to the cost of a different approach that uses similarity coefficients. Furthermore, the TDVs obtained by the CC approach are consistent with those of the latter approach. © 1987 John Wiley & Sons, Inc. References 1 Van Rijsbergen, C. J., Information Retrieval, 2nd ed. London: Butter-worths; 1979. Google Scholar 2 Salton, G., Dynamic Information and Library/Processing. Engle-wood Cliffs, NJ: Prentice Hall; 1975. Google Scholar 3 Salton, G; McGill, M. J., Introduction to Modern Information Retrieval. New York: McGraw Hill; 1983. Google Scholar 4 Maron, M. E., "Depth of Indexing." Journal of the American Society for Information Science. 19: 224–228; 1978. Google Scholar 5 Borko, H., "Toward a Theory of Indexing." Information Processing and Management. 13: 355–365; 1977. 10.1016/0306-4573(77)90055-3 Web of Science®Google Scholar 6 Salton, G., " A Theory of Indexing." Regional Conference Series in Applied Mathematics, No. 18. Philadelphia, PA: Society for Industrial and Applied Mathematics; 1975. Google Scholar 7 Salton, G; Wong, A; Yang, C. S., "A Vector Space Model for Automatic Indexing." Communications of the Association for Computing Machinery. 18 (11): 613–620; 1975. 10.1145/361219.361220 Web of Science®Google Scholar 8 Yu, C. T., Salton, G., "Precision Weighting — An Effective Automatic Indexing Method." Journal of the Association for Computing Machinery. 23 (1): 76–88; 1976. 10.1145/321921.321930 Web of Science®Google Scholar 9 Salton, G; Wong, A; Yu, C. T., "Automatic Indexing Using Term Discrimination and Term Precision Measurements." Information Processing and Management. 12: 43–51; 1976. 10.1016/0306-4573(76)90026-1 Web of Science®Google Scholar 10 Salton, G; Wu, H; Yu, C. T., "The Measurement of Term Importance in Automatic Indexing." Journal of the American Society for Information Science. 32 (3): 175–186; 1981. 10.1002/asi.4630320304 Web of Science®Google Scholar 11 Cooper, W. S.; Maron, M. E., "Foundations of Probabilistic and Utility-Theoretic Indexing." Journal of the Association for Computing Machinery. 25 (1): 67–80; 1978. 10.1145/322047.322053 Web of Science®Google Scholar 12 Croft, W. B.; Harper, D. H., "Using Probabilistic Strategies with no Relevance Information." Journal of Documentation. 35: 285–295; 1979. 10.1108/eb026683 Web of Science®Google Scholar 13 Can, F; Ozkarahan, E. A., " A Clustering Scheme." Proceedings of the ACM SIGIR Conference, Bethesda, Md.; June 1983: 115–121. Google Scholar 14 Can, F; Ozkarahan, E. A., "Two Partitioning Type Clustering Algorithms." Journal of the American Society for Information Science. 35 (5): 268–276; 1984. 10.1002/asi.4630350503 Web of Science®Google Scholar 15 Can, F; Ozkarahan, E. A., "Similarity and Stability Analysis of the Two Partitioning Type Clustering Algorithms." Journal of the American Society for Information Science. 36 (1): 3–14; 1985. 10.1002/asi.4630360101 Web of Science®Google Scholar 16 Can, F., " A New Clustering Scheme for Information Retrieval Systems Incorporating the Support of a Database Machine." Ph.D. dissertation. Ankara: Department of Computer Engineering, Middle East Technical University; January 1985. Google Scholar 17 Ozkarahan, E., Database Machines and Database Management. Englewood Cliffs, NJ: Prentice-Hall; 1986. Google Scholar 18 Can, F; Ozkarahan, E. A., " Concepts of the Cover Coefficient-Based Clustering Methodology." Proceedings of the ACM SIGIR Conference, Montreal; June 1985: 204–211. Google Scholar 19 Everitt, B. S., "Unresolved Problems in Cluster Analysis." Biometrics. 35: 169–181; 1979. 10.2307/2529943 Web of Science®Google Scholar 20 Can, F; Ozkarahan, E. A.; Kutluay, S., " Validity Analysis of the Cover Coefficient Based Clustering Methodology." Unpublished. Google Scholar 21 Nelson, M. J.; Tague, J. M., "Split Size-Rank Models for the Distribution of Index Terms," Journal of the American Society for Information Science. 36 (5): 283–296; 1985. 10.1002/asi.4630360502 Web of Science®Google Scholar 22 Salton, G; Wong, A., "Generation and Search of Clustered Files." Association for Computing Machinery Transactions on Database Systems. 3 (4): 321–346; 1978. 10.1145/320289.320291 Google Scholar 23 Willett, P., "An Algorithm for the Calculation of Exact Term Discrimination Values." Information Processing and Management. 21 (3): 225–232; 1985. 10.1016/0306-4573(85)90107-4 Web of Science®Google Scholar 24 Crawford, R. G., "The Computation of Discrimination Values." Information Processing and Management 11: 249–253; 1975. 10.1016/0306-4573(75)90022-9 Web of Science®Google Scholar 25 Nie, N. H.; Hull, C. H.; Jenkis, J. G.; Steinbrenner, K; Bent, D. H., SPSS Statistical Package for the Social Sciences, 2nd ed. New York: McGraw-Hill; 1975. Web of Science®Google Scholar 26 Croft, W. B., "Document Representation in Probabilistic Models of Information Retrieval," Journal of the American Society for Information Science. 33: 451–457; 1981. 10.1002/asi.4630320609 Web of Science®Google Scholar 27 Borko, H., "Automatic Indexing: A Tutorial." ACM SIGIR Forum. 16 (2): 9–13; 1982. 10.1145/1095454.1095456 Google Scholar 28 Lancaster, F. W., Information Retrieval Systems: Characteristics, Testing and Evaluation, 2nd ed. New York: Wiley; 1975. Google Scholar Citing Literature Volume38, Issue3May 1987Pages 171-183 ReferencesRelatedInformation
Referência(s)