Oscar Shen, Jayanth S. Pratap, Xiang Li, Neal C. Chen, Abhiram R. Bhashyam,
... network analysis using term frequency–inverse document frequency (TF-IDF) was used to compare text similarity between responses ... Google Search. In the field of information retrieval, TF-IDF is a weighted statistical measure of the importance ... a document in a collection of documents. Higher TF-IDF scores indicate greater similarity between two sources. TF-IDF scores are most often used to compare and ... can be determined by calculating and summing the TF-IDF for all keywords in a ChatGPT response and ... ChatGPT and Google Search responses by finding the TF-IDF scores of the ChatGPT response and each of ...
Tópico(s): Meta-analysis and systematic reviews
2024 - Lippincott Williams & Wilkins | Clinical Orthopaedics and Related Research
Anbuselvan Sangodiah, Tham Jee San, Yong Tien Fui, Lim Ean Heng, Ramesh Kumar Ayyasamy, Norazira A. Jalil,
... the past work is unsupervised namely TF and TF-IDF. There are several variants of TF and TFIDF ... Therefore, this paper aims to study the TF, TF-IDF and normalized TF-IDF variants and identify the optimal variant that can ... and Naïve Bayes. The average accuracies achieved by TF-IDF and normalized TF-IDF variants using SVM classifier were 64.3% and ... using Naïve Bayes classifier the average accuracies for TF-IDF and normalized TF-IDF were 61.9% and 63.0% respectively. Generally, the normalized TF-IDF variants outperformed TF and TF-IDF variants in ...
Tópico(s): Online Learning and Analytics
2022 - Brno University of Technology | MENDEL
Ana Tsalitsatun Ni’mah, Agus Zainal Arifin,
... term weighting, yaitu: Term Frequency Inverse Document Frequency (TF-IDF), Term Frequency Inverse Document Frequency Inverse Class Frequency (TF-IDF-ICF), Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency (TF-IDF-ICSδF), dan Term Frequency Inverse Document Frequency Inverse ... Space Density Frequency Inverse Hadith Space Density Frequency (TF-IDF-ICSδF-IHSδF). Penelitian ini melakukan perbandingan hasil term ... menunjukkan bahwa hasil klasifikasi menggunakan metode term weighting TF-IDF-ICSδF-IHSδF mengungguli term weighting lainnya, yaitu mendapatkan ...
Tópico(s): Information Retrieval and Data Mining
2020 - University of Trunojoyo Madura | Rekayasa
Safaa Magdy, Yasmine Abouelseoud, Mervat Mikhail,
... are investigated. Weighted term frequency inverse document frequency (tf-idf) together with the Jaccard similarity measure yield the ... that employs the term frequency-inverse document frequency (tf-idf) weighting method. They considered rare areas in images, ... developed by Lou et al. [9] based on tf-idf with the aid of a BOVW model. More ... from the traditional Jaccard similarity measure and the tf-idf weighted one. Resource constrained devices need to securely ... semi-homomorphic encryption is used to encrypt the tf-idf weighted frequency in [10]. Another encryption technique that ...
Tópico(s): Data Management and Algorithms
2019 - Institution of Engineering and Technology | IET Image Processing
Ramin Shokripour, John Anvik, Zarinah Mohd Kasirun, Sima Zamani,
... and information-retrieval techniques. These approaches commonly use tf-idf, a statistical computation technique for weighting terms based on term frequency. However, tf-idf does not consider the metadata, such as the ... assignment approach improves from 12 to 49% over tf-idf for three open-source projects. 1 Introduction Increasing ... second piece of metadata is similar to the tf-idf technique, the distinction is that instead of categorising ... In either case, TNBA is also compared to tf-idf, the most common term-weighting technique that weighs ...
Tópico(s): Web Application Security Vulnerabilities
2014 - Institution of Engineering and Technology | IET Software
Zhuohao Wang, Dong Wang, Qing Li,
Keyword extraction by Term frequency-Inverse document frequency (TF-IDF) is used for text information retrieval and mining ... particularly in the scientific research field. The traditional TF-IDF algorithm considers only the word frequency in documents, ... characteristics. Therefore, we propose the Scientific research project TF-IDF (SRP-TF-IDF) model, which combines TF-IDF with a weight balance algorithm designed to recalculate candidate keywords. We have implemented the SRP-TF-IDF model and verified that our method has better ...
Tópico(s): Web Data Mining and Analysis
2021 - Institution of Engineering and Technology | Chinese Journal of Electronics
Denis Eka Cahyani, Irene Patasik,
... text. This study compare the performance of the TF-IDF and Word2Vec models to represent features in the ... This study used three scenarios, namely SVM with TF-IDF, SVM with Word2Vec, and MNB with TF-IDF. The SVM with TF-IDF method generate the highest accuracy compared to other ... steps classification, then followed by the MNB with TF-IDF, and the last is SVM with Word2Vec. Then, ... and F1-measure results that the SVM with TF-IDF provides the best overall method. This study shows TF-IDF modeling has better performance than Word2Vec modeling and ...
Tópico(s): Edcuational Technology Systems
2021 - Institute of Advanced Engineering and Science (IAES) | Bulletin of Electrical Engineering and Informatics
Ramin Shokripour, John Anvik, Zarinah Mohd Kasirun, Sima Zamani,
... techniques, such as term frequency-inverse document frequency (tf-idf), to determine the value of terms. However, the ... automatic bug assignment by using time-metadata in tf-idf (Time-tf-idf). In the Time-tf-idf technique, the recency of using the term by ... recommended automatic bug assignment approach that uses Time-tf-idf, called ABA-Time-tf-idf, was conducted on three open-source projects. The ... 94%, respectively, in comparison to the use of tf-idf. Moreover, the ABA-Time-tf-idf approach outperforms ...
Tópico(s): Web Data Mining and Analysis
2015 - Elsevier BV | Journal of Systems and Software
... bottlenecks. Based on Term Frequency-Inverse Document Frequency (TF-IDF) and deep learning technologies, this paper proposes an advanced TF-IDF mechanism, namely TF-IDF-COR, to extract text feature representations of public ... as follows: (1) based on the advantages of TF-IDF mechanism, we propose a TF-IDF-COR mechanism, which integrates the correlation coefficient of word embeddings to TF-IDF. (2) To make the extracted feature semantic information more comprehensive, CNN and TF-IDF-COR are combined to form an effective COR- ...
Tópico(s): Advanced Text Analysis Techniques
2022 - Frontiers Media | Frontiers in Psychology
Neha Agarwal, Geeta Sikka, Lalit Kumar Awasthi,
... and used the Term Frequency – Inverse Document Frequency (TF-IDF) method for representing web services in vector space. In general, there are various limitations of the TF-IDF approach i.e. (1) Not efficient for large ... scenario, services are represented in short text form. TF-IDF does not work well in web service representation ... second having small number of terms respectively then TF-IDF does not demonstrate the importance of terms in ' ... approach are compared with K-Means clustering on TF-IDF representation method i.e. TF-IDF+K. Results ...
Tópico(s): Advanced Clustering Algorithms Research
2020 - Elsevier BV | Expert Systems with Applications
... proposed a modified term frequency-inverse document frequency (TF-IDF) method for extracting features from Arabic questions. Unlike the traditional TF-IDF, the proposed method was designed to take the ... for classifying English questions, i.e., the traditional TF-IDF and a modified TF-IDF method called term frequency part-of-speech-inverse ... significantly higher than those scored by the traditional TF-IDF and TFPOS-IDF methods. The evaluation results indicate ...
Tópico(s): Natural Language Processing Techniques
2021 - Institute of Electrical and Electronics Engineers | IEEE Access
Derek J. Paulsen, Yash Govind, AnHai Doan,
... we can tell, blocking using the well-known tf/idf measure has received virtually no attention. Yet, when we experimented with tf/idf blocking using Lucene, we found it did quite well. So in this paper we examine tf/idf blocking in depth. We develop Sparkly, which uses Lucene to perform top-k tf/idf blocking in a distributed share-nothing fashion on ... size and runtime. Our findings suggest that (a) tf/idf blocking needs more attention, (b) Sparkly forms a ...
Tópico(s): Advanced Database Systems and Queries
2023 - Association for Computing Machinery | Proceedings of the VLDB Endowment
TF-IDF (term frequency-inverse document frequency) is one of the traditional text similarity calculation methods based on statistics. Because TF-IDF does not consider the semantic information of words, ... advances a hybrid with the semantic understanding and TF-IDF to calculate the similarity of texts. Based on ... the hybrid method is better than the pure TF-IDF and the method of semantic understanding at the ...
Tópico(s): Text and Document Classification Technologies
2022 - Hindawi Publishing Corporation | Advances in Multimedia

Amir Jalilifard, Vinícius F. Caridá, Alex Fernandes Mansano, Rogers S. Cristo, Felipe Penhorate Carvalho da Fonseca,
... STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in ... semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to ... chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% ... as opposed to 27.2% of the original TF-IDF.
Tópico(s): Topic Modeling
2021 - Springer Science+Business Media | Lecture notes in electrical engineering
Hyunbin Kim, Ji-Won Baek, Kyungyong Chung,
... optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are generated. News data ... the generated knowledge graph, the proposed method utilizes TF-IDF based ranking scores to remove terms with low ...
Tópico(s): Cognitive Computing and Networks
2020 - Multidisciplinary Digital Publishing Institute | Applied Sciences
Zhang Yan, Yue Zhou, JingTao Yao,
TF-IDF is one of the most commonly used weighting metrics for measuring the relationship of words to ... In many research and applications, the thresholds of TF-IDF for selecting relevant words are only based on ... set model to select the word features given TF-IDF information. Game-theoretic shadowed sets determine the thresholds of TF-IDF using game theory and repetition learning mechanism. Experimental ...
Tópico(s): Data Mining Algorithms and Applications
2020 - Springer Science+Business Media | Communications in computer and information science
Ankit Thakkar, Kinjal Chaudhari,
... enhance future returns. Term frequency–inverse document frequency (TF–IDF) is a statistical approach with remarkable applications in ... document of a corpus. However, the application of TF–IDF for the numerical data representation is explored to ... article, we propose to extend the applicability of TF–IDF for the numerical time-series stock market data; ... data and prepare them to be suitable for TF–IDF. We utilize this statistical approach to derive feature ... results show that the proposed integrated approach using TF–IDF-based feature weight matrix and neural networks outperforms ...
Tópico(s): Time Series Analysis and Forecasting
2020 - Elsevier BV | Applied Soft Computing
Devid Haryalesmana Wahid, Azhari Sn,
... Indonesian selebrity, Agnes Monica , by combining SentiStrength, Hybrid TF-IDF, and Cosine Similarity. SentiStrength is used to obtain ... can be done by rank tweets using Hybrid TF-IDF summarization and sentiment strength score as additional weight ... results showed that the combination of SentiStrength, Hybrid TF-IDF, and Cosine Similarity perform better than using Hybrid TF-IDF only, given an average 60 % accuracy and 62% ...
Tópico(s): Data Mining and Machine Learning Applications
2016 - Gadjah Mada University | IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Rafael T. Anchiêta, Francisco Assis Ricarte Neto, Rogério Figueiredo de Sousa, Raimundo Santos Moura,
... classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a ... whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better ... accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible ...
Tópico(s): Sentiment Analysis and Opinion Mining
2015 - Springer Science+Business Media | Lecture notes in computer science
Ugo Erra, Sabrina Senatore, Fernando Minnella, Giuseppe Caggianese,
... relevance measure is Term Frequency–Inverse Document Frequency (TF–IDF), which can evaluate how important a word is ... problem, we propose an approximate version of the TF–IDF measure suitable to work on continuous data stream ( ... present also a parallel implementation of the approximate TF–IDF calculation using Graphical Processing Units (GPUs). This implementation ... results demonstrate that the approximate version of the TF–IDF measure performs at a level that is comparable to the solution of the precise TF–IDF measure.
Tópico(s): Data Management and Algorithms
2014 - Elsevier BV | Information Sciences
Fuji Ren, Mohammad Golam Sohrab,
... and comparatively low-dimensional vector space over the TF.IDF and five other different term weighting approaches that ... approaches. First, we implement a class-indexing-based TF.IDF.ICF observational term weighting approach in which the ... In the experiment, we investigate the effects of TF.IDF.ICF over the Reuters-21578, 20 Newsgroups, and ... class space density frequency (ICSδF), and generated the TF.IDF.ICSδF method that provides a positive discrimination on ... results show that the proposed class-indexing-based TF.IDF.ICSδF term weighting approach is promising over the ...
Tópico(s): Web Data Mining and Analysis
2013 - Elsevier BV | Information Sciences
Ho Chung Wu, Robert W. P. Luk, Kam‐Fai Wong, K. L. Kwok,
... presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates ... decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) ... as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. ... basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the ...
Tópico(s): Topic Modeling
2008 - Association for Computing Machinery | ACM transactions on office information systems
Jing Li Zhou, Zhanliang Ye, Sheng Zhang, Geng Zhao, Ning Han, Tao Yang,
... the efficacy of Term Frequency-Inverse Document Frequency (TF-IDF) and Word to Vector (Word2vec) in extracting response ... 2012 computer-based problem-solving dataset revealed that TF-IDF effectively extracted key response behaviors, whereas Word2vec captured ... both methods, the random forest model based on TF-IDF performed the best, followed by the SVM model based on Word2vec. Word2vec-based models outperformed TF-IDF-based ones in the F1-score, accuracy, and ...
Tópico(s): Advanced Text Analysis Techniques
2024 - Elsevier BV | Heliyon
... readers. In the realm of text classification, the TF-IDF algorithm is a widely used classification algorithm. However, ... possible to improve classification accuracy by using the TF-IDF algorithm in this paper’s application situation by ... are separated to increase the trust in the TF-IDF algorithm’s output. Using the improved TF-IDF method suggested in this research with the random ...
Tópico(s): Advanced Text Analysis Techniques
2022 - Hindawi Publishing Corporation | Advances in Multimedia
Ganesha Ihya Ulumudin, Adiwijaya Adiwijaya, Mohamad Syahrul Mubarok,
... there is a comparison between feature extraction, Weighted TF-IDF and TF-IDF. The result of that comparison is that Weigthed TF-IDF has better performance compared to normal TF-IDF. The highest result by finding the most optimal ...
Tópico(s): Advanced Text Analysis Techniques
2019 - IOP Publishing | Journal of Physics Conference Series
Hankiz Yilahun, Askar Hamdulla,
... of keyword extraction based on methods such as TF-IDF. Thus, this study proposes a Chinese entity extraction ... based on the combination of information entropy and TF-IDF. First, the text is preprocessed, which involves operations ... analysed to determine feature word weight, and the TF-IDF algorithm is used to compare the importance of ... Finally, information entropy is used to improve the TF-IDF algorithm to provide entity knowledge for the construction ...
Tópico(s): Advanced Text Analysis Techniques
2023 - Inderscience Publishers | International Journal of Reasoning-based Intelligent Systems
... used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the calculation of ... analysis experiment. The results showed that the improved TF–IDF algorithm had the shortest running time and took ... between the two algorithms demonstrated that the improved TF–IDF algorithm had the best performance, with a precision ... article. The experimental results show that the improved TF–IDF algorithm is effective in extracting English text keywords, ...
Tópico(s): Information Retrieval and Search Behavior
2021 - IlmuKomputer.Com | Journal of Intelligent Systems
Quan Do, Muhammad Ali Moriyani, Chau Le, Tuyen Le,
... cost-weighted term frequency-inverse document frequency (CW-TF-IDF) method that incorporates the cost information of pay items into the traditional TF-IDF word embedding method to measure project similarity. The ... an SHA. The findings showed that the CW-TF-IDF method significantly improves project clustering performance compared to ... recent state-of-the-art method. The CW-TF-IDF method also showed its outperformance in project bundling ...
Tópico(s): Infrastructure Maintenance and Monitoring
2023 - American Society of Civil Engineers | Journal of Construction Engineering and Management
Junkai Yi, Guang Yang, Jing Wan,
... in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term’s distribution inside ... This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of ... algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as ...
Tópico(s): Advanced Text Analysis Techniques
2016 - Institute of Information Science | Journal of information science and engineering
Qifeng Wan, Xuanhua Xu, Jing Han,
... scale group decision-making scenarios. This method combines TF-IDF feature similarity and information loss entropy to address ... capture the semantics of the text as a TF-IDF feature matrix, which facilitates subsequent calculations. Secondly, a cluster process integrating the TF-IDF feature similarity is operated to divide the large- ...
Tópico(s): Rough Sets and Fuzzy Logic
2023 - Elsevier BV | Applied Soft Computing