Limpar
5.065 resultados

Acesso aberto

Tipo do recurso

Ano de criação

Produção nacional

Revisado por pares

Áreas

Idioma

Editores

Artigo Revisado por pares

Oscar Shen, Jayanth S. Pratap, Xiang Li, Neal C. Chen, Abhiram R. Bhashyam,

... network analysis using term frequency–inverse document frequency (TF-IDF) was used to compare text similarity between responses ... Google Search. In the field of information retrieval, TF-IDF is a weighted statistical measure of the importance ... a document in a collection of documents. Higher TF-IDF scores indicate greater similarity between two sources. TF-IDF scores are most often used to compare and ... can be determined by calculating and summing the TF-IDF for all keywords in a ChatGPT response and ... ChatGPT and Google Search responses by finding the TF-IDF scores of the ChatGPT response and each of ...

Tópico(s): Meta-analysis and systematic reviews

2024 - Lippincott Williams & Wilkins | Clinical Orthopaedics and Related Research

Artigo Acesso aberto Revisado por pares

Anbuselvan Sangodiah, Tham Jee San, Yong Tien Fui, Lim Ean Heng, Ramesh Kumar Ayyasamy, Norazira A. Jalil,

... the past work is unsupervised namely TF and TF-IDF. There are several variants of TF and TFIDF ... Therefore, this paper aims to study the TF, TF-IDF and normalized TF-IDF variants and identify the optimal variant that can ... and Naïve Bayes. The average accuracies achieved by TF-IDF and normalized TF-IDF variants using SVM classifier were 64.3% and ... using Naïve Bayes classifier the average accuracies for TF-IDF and normalized TF-IDF were 61.9% and 63.0% respectively. Generally, the normalized TF-IDF variants outperformed TF and TF-IDF variants in ...

Tópico(s): Online Learning and Analytics

2022 - Brno University of Technology | MENDEL

Artigo Acesso aberto Revisado por pares

Ana Tsalitsatun Ni’mah, Agus Zainal Arifin,

... term weighting, yaitu: Term Frequency Inverse Document Frequency (TF-IDF), Term Frequency Inverse Document Frequency Inverse Class Frequency (TF-IDF-ICF), Term Frequency Inverse Document Frequency Inverse Class Space Density Frequency (TF-IDF-ICSδF), dan Term Frequency Inverse Document Frequency Inverse ... Space Density Frequency Inverse Hadith Space Density Frequency (TF-IDF-ICSδF-IHSδF). Penelitian ini melakukan perbandingan hasil term ... menunjukkan bahwa hasil klasifikasi menggunakan metode term weighting TF-IDF-ICSδF-IHSδF mengungguli term weighting lainnya, yaitu mendapatkan ...

Tópico(s): Information Retrieval and Data Mining

2020 - University of Trunojoyo Madura | Rekayasa

Artigo Acesso aberto Revisado por pares

Safaa Magdy, Yasmine Abouelseoud, Mervat Mikhail,

... are investigated. Weighted term frequency inverse document frequency (tf-idf) together with the Jaccard similarity measure yield the ... that employs the term frequency-inverse document frequency (tf-idf) weighting method. They considered rare areas in images, ... developed by Lou et al. [9] based on tf-idf with the aid of a BOVW model. More ... from the traditional Jaccard similarity measure and the tf-idf weighted one. Resource constrained devices need to securely ... semi-homomorphic encryption is used to encrypt the tf-idf weighted frequency in [10]. Another encryption technique that ...

Tópico(s): Data Management and Algorithms

2019 - Institution of Engineering and Technology | IET Image Processing

Artigo Acesso aberto Revisado por pares

Ramin Shokripour, John Anvik, Zarinah Mohd Kasirun, Sima Zamani,

... and information-retrieval techniques. These approaches commonly use tf-idf, a statistical computation technique for weighting terms based on term frequency. However, tf-idf does not consider the metadata, such as the ... assignment approach improves from 12 to 49% over tf-idf for three open-source projects. 1 Introduction Increasing ... second piece of metadata is similar to the tf-idf technique, the distinction is that instead of categorising ... In either case, TNBA is also compared to tf-idf, the most common term-weighting technique that weighs ...

Tópico(s): Web Application Security Vulnerabilities

2014 - Institution of Engineering and Technology | IET Software

Artigo Acesso aberto Revisado por pares

Zhuohao Wang, Dong Wang, Qing Li,

Keyword extraction by Term frequency-Inverse document frequency (TF-IDF) is used for text information retrieval and mining ... particularly in the scientific research field. The traditional TF-IDF algorithm considers only the word frequency in documents, ... characteristics. Therefore, we propose the Scientific research project TF-IDF (SRP-TF-IDF) model, which combines TF-IDF with a weight balance algorithm designed to recalculate candidate keywords. We have implemented the SRP-TF-IDF model and verified that our method has better ...

Tópico(s): Web Data Mining and Analysis

2021 - Institution of Engineering and Technology | Chinese Journal of Electronics

Artigo Acesso aberto

Denis Eka Cahyani, Irene Patasik,

... text. This study compare the performance of the TF-IDF and Word2Vec models to represent features in the ... This study used three scenarios, namely SVM with TF-IDF, SVM with Word2Vec, and MNB with TF-IDF. The SVM with TF-IDF method generate the highest accuracy compared to other ... steps classification, then followed by the MNB with TF-IDF, and the last is SVM with Word2Vec. Then, ... and F1-measure results that the SVM with TF-IDF provides the best overall method. This study shows TF-IDF modeling has better performance than Word2Vec modeling and ...

Tópico(s): Edcuational Technology Systems

2021 - Institute of Advanced Engineering and Science (IAES) | Bulletin of Electrical Engineering and Informatics

Artigo Revisado por pares

Ramin Shokripour, John Anvik, Zarinah Mohd Kasirun, Sima Zamani,

... techniques, such as term frequency-inverse document frequency (tf-idf), to determine the value of terms. However, the ... automatic bug assignment by using time-metadata in tf-idf (Time-tf-idf). In the Time-tf-idf technique, the recency of using the term by ... recommended automatic bug assignment approach that uses Time-tf-idf, called ABA-Time-tf-idf, was conducted on three open-source projects. The ... 94%, respectively, in comparison to the use of tf-idf. Moreover, the ABA-Time-tf-idf approach outperforms ...

Tópico(s): Web Data Mining and Analysis

2015 - Elsevier BV | Journal of Systems and Software

Artigo Acesso aberto Revisado por pares

Hangfeng Lin, Naiqing Bu,

... bottlenecks. Based on Term Frequency-Inverse Document Frequency (TF-IDF) and deep learning technologies, this paper proposes an advanced TF-IDF mechanism, namely TF-IDF-COR, to extract text feature representations of public ... as follows: (1) based on the advantages of TF-IDF mechanism, we propose a TF-IDF-COR mechanism, which integrates the correlation coefficient of word embeddings to TF-IDF. (2) To make the extracted feature semantic information more comprehensive, CNN and TF-IDF-COR are combined to form an effective COR- ...

Tópico(s): Advanced Text Analysis Techniques

2022 - Frontiers Media | Frontiers in Psychology

Artigo Revisado por pares

Neha Agarwal, Geeta Sikka, Lalit Kumar Awasthi,

... and used the Term Frequency – Inverse Document Frequency (TF-IDF) method for representing web services in vector space. In general, there are various limitations of the TF-IDF approach i.e. (1) Not efficient for large ... scenario, services are represented in short text form. TF-IDF does not work well in web service representation ... second having small number of terms respectively then TF-IDF does not demonstrate the importance of terms in ' ... approach are compared with K-Means clustering on TF-IDF representation method i.e. TF-IDF+K. Results ...

Tópico(s): Advanced Clustering Algorithms Research

2020 - Elsevier BV | Expert Systems with Applications

Artigo Acesso aberto Revisado por pares

Ali Alammary,

... proposed a modified term frequency-inverse document frequency (TF-IDF) method for extracting features from Arabic questions. Unlike the traditional TF-IDF, the proposed method was designed to take the ... for classifying English questions, i.e., the traditional TF-IDF and a modified TF-IDF method called term frequency part-of-speech-inverse ... significantly higher than those scored by the traditional TF-IDF and TFPOS-IDF methods. The evaluation results indicate ...

Tópico(s): Natural Language Processing Techniques

2021 - Institute of Electrical and Electronics Engineers | IEEE Access

Artigo Revisado por pares

Derek J. Paulsen, Yash Govind, AnHai Doan,

... we can tell, blocking using the well-known tf/idf measure has received virtually no attention. Yet, when we experimented with tf/idf blocking using Lucene, we found it did quite well. So in this paper we examine tf/idf blocking in depth. We develop Sparkly, which uses Lucene to perform top-k tf/idf blocking in a distributed share-nothing fashion on ... size and runtime. Our findings suggest that (a) tf/idf blocking needs more attention, (b) Sparkly forms a ...

Tópico(s): Advanced Database Systems and Queries

2023 - Association for Computing Machinery | Proceedings of the VLDB Endowment

Artigo Acesso aberto Revisado por pares

Fei Lan,

TF-IDF (term frequency-inverse document frequency) is one of the traditional text similarity calculation methods based on statistics. Because TF-IDF does not consider the semantic information of words, ... advances a hybrid with the semantic understanding and TF-IDF to calculate the similarity of texts. Based on ... the hybrid method is better than the pure TF-IDF and the method of semantic understanding at the ...

Tópico(s): Text and Document Classification Technologies

2022 - Hindawi Publishing Corporation | Advances in Multimedia

Capítulo de livro Acesso aberto Brasil Produção Nacional Revisado por pares

Amir Jalilifard, Vinícius F. Caridá, Alex Fernandes Mansano, Rogers S. Cristo, Felipe Penhorate Carvalho da Fonseca,

... STF-IDF, a novel semantic method based on TF-IDF, for scoring word importance of informal documents in ... semantic space were utilized to rearrange the original TF-IDF scores through an iterative solution so as to ... chosen documents, our method managed to decrease the TF-IDF mean error rate by a factor of 50% ... as opposed to 27.2% of the original TF-IDF.

Tópico(s): Topic Modeling

2021 - Springer Science+Business Media | Lecture notes in electrical engineering

Artigo Acesso aberto Revisado por pares

Hyunbin Kim, Ji-Won Baek, Kyungyong Chung,

... optimization method of the associative knowledge graph using TF-IDF based ranking scores. The proposed method calculates TF-IDF weights in all documents and generates term ranking. Based on the terms with high scores from TF-IDF based ranking, optimized transactions are generated. News data ... the generated knowledge graph, the proposed method utilizes TF-IDF based ranking scores to remove terms with low ...

Tópico(s): Cognitive Computing and Networks

2020 - Multidisciplinary Digital Publishing Institute | Applied Sciences

Capítulo de livro Acesso aberto Revisado por pares

Zhang Yan, Yue Zhou, JingTao Yao,

TF-IDF is one of the most commonly used weighting metrics for measuring the relationship of words to ... In many research and applications, the thresholds of TF-IDF for selecting relevant words are only based on ... set model to select the word features given TF-IDF information. Game-theoretic shadowed sets determine the thresholds of TF-IDF using game theory and repetition learning mechanism. Experimental ...

Tópico(s): Data Mining Algorithms and Applications

2020 - Springer Science+Business Media | Communications in computer and information science

Artigo Revisado por pares

Ankit Thakkar, Kinjal Chaudhari,

... enhance future returns. Term frequency–inverse document frequency (TF–IDF) is a statistical approach with remarkable applications in ... document of a corpus. However, the application of TF–IDF for the numerical data representation is explored to ... article, we propose to extend the applicability of TF–IDF for the numerical time-series stock market data; ... data and prepare them to be suitable for TF–IDF. We utilize this statistical approach to derive feature ... results show that the proposed integrated approach using TF–IDF-based feature weight matrix and neural networks outperforms ...

Tópico(s): Time Series Analysis and Forecasting

2020 - Elsevier BV | Applied Soft Computing

Artigo Acesso aberto Revisado por pares

Devid Haryalesmana Wahid, Azhari Sn,

... Indonesian selebrity, Agnes Monica , by combining SentiStrength, Hybrid TF-IDF, and Cosine Similarity. SentiStrength is used to obtain ... can be done by rank tweets using Hybrid TF-IDF summarization and sentiment strength score as additional weight ... results showed that the combination of SentiStrength, Hybrid TF-IDF, and Cosine Similarity perform better than using Hybrid TF-IDF only, given an average 60 % accuracy and 62% ...

Tópico(s): Data Mining and Machine Learning Applications

2016 - Gadjah Mada University | IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Capítulo de livro Brasil Produção Nacional Revisado por pares

Rafael T. Anchiêta, Francisco Assis Ricarte Neto, Rogério Figueiredo de Sousa, Raimundo Santos Moura,

... classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a ... whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better ... accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible ...

Tópico(s): Sentiment Analysis and Opinion Mining

2015 - Springer Science+Business Media | Lecture notes in computer science

Artigo Acesso aberto Revisado por pares

Ugo Erra, Sabrina Senatore, Fernando Minnella, Giuseppe Caggianese,

... relevance measure is Term Frequency–Inverse Document Frequency (TF–IDF), which can evaluate how important a word is ... problem, we propose an approximate version of the TF–IDF measure suitable to work on continuous data stream ( ... present also a parallel implementation of the approximate TF–IDF calculation using Graphical Processing Units (GPUs). This implementation ... results demonstrate that the approximate version of the TF–IDF measure performs at a level that is comparable to the solution of the precise TF–IDF measure.

Tópico(s): Data Management and Algorithms

2014 - Elsevier BV | Information Sciences

Artigo Revisado por pares

Fuji Ren, Mohammad Golam Sohrab,

... and comparatively low-dimensional vector space over the TF.IDF and five other different term weighting approaches that ... approaches. First, we implement a class-indexing-based TF.IDF.ICF observational term weighting approach in which the ... In the experiment, we investigate the effects of TF.IDF.ICF over the Reuters-21578, 20 Newsgroups, and ... class space density frequency (ICSδF), and generated the TF.IDF.ICSδF method that provides a positive discrimination on ... results show that the proposed class-indexing-based TF.IDF.ICSδF term weighting approach is promising over the ...

Tópico(s): Web Data Mining and Analysis

2013 - Elsevier BV | Information Sciences

Artigo

Ho Chung Wu, Robert W. P. Luk, Kam‐Fai Wong, K. L. Kwok,

... presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates ... decision for the document. The significance of interpreting TF-IDF in this way is the potential to: (1) ... as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models. ... basic ranking formula that directly corresponds to the TF-IDF term weights. In general, we show that the ...

Tópico(s): Topic Modeling

2008 - Association for Computing Machinery | ACM transactions on office information systems

Artigo Acesso aberto Revisado por pares

Jing Li Zhou, Zhanliang Ye, Sheng Zhang, Geng Zhao, Ning Han, Tao Yang,

... the efficacy of Term Frequency-Inverse Document Frequency (TF-IDF) and Word to Vector (Word2vec) in extracting response ... 2012 computer-based problem-solving dataset revealed that TF-IDF effectively extracted key response behaviors, whereas Word2vec captured ... both methods, the random forest model based on TF-IDF performed the best, followed by the SVM model based on Word2vec. Word2vec-based models outperformed TF-IDF-based ones in the F1-score, accuracy, and ...

Tópico(s): Advanced Text Analysis Techniques

2024 - Elsevier BV | Heliyon

Artigo Acesso aberto Revisado por pares

Lin Xiang,

... readers. In the realm of text classification, the TF-IDF algorithm is a widely used classification algorithm. However, ... possible to improve classification accuracy by using the TF-IDF algorithm in this paper’s application situation by ... are separated to increase the trust in the TF-IDF algorithm’s output. Using the improved TF-IDF method suggested in this research with the random ...

Tópico(s): Advanced Text Analysis Techniques

2022 - Hindawi Publishing Corporation | Advances in Multimedia

Artigo Acesso aberto Revisado por pares

Ganesha Ihya Ulumudin, Adiwijaya Adiwijaya, Mohamad Syahrul Mubarok,

... there is a comparison between feature extraction, Weighted TF-IDF and TF-IDF. The result of that comparison is that Weigthed TF-IDF has better performance compared to normal TF-IDF. The highest result by finding the most optimal ...

Tópico(s): Advanced Text Analysis Techniques

2019 - IOP Publishing | Journal of Physics Conference Series

Artigo Revisado por pares

Hankiz Yilahun, Askar Hamdulla,

... of keyword extraction based on methods such as TF-IDF. Thus, this study proposes a Chinese entity extraction ... based on the combination of information entropy and TF-IDF. First, the text is preprocessed, which involves operations ... analysed to determine feature word weight, and the TF-IDF algorithm is used to compare the importance of ... Finally, information entropy is used to improve the TF-IDF algorithm to provide entity knowledge for the construction ...

Tópico(s): Advanced Text Analysis Techniques

2023 - Inderscience Publishers | International Journal of Reasoning-based Intelligent Systems

Artigo Acesso aberto Revisado por pares

Jinye Li,

... used algorithms, the term frequency–inverse document frequency (TF–IDF) algorithm and the keyphrase extraction algorithm (KEA), were introduced. Then, an improved TF–IDF algorithm was designed, which improved the calculation of ... analysis experiment. The results showed that the improved TF–IDF algorithm had the shortest running time and took ... between the two algorithms demonstrated that the improved TF–IDF algorithm had the best performance, with a precision ... article. The experimental results show that the improved TF–IDF algorithm is effective in extracting English text keywords, ...

Tópico(s): Information Retrieval and Search Behavior

2021 - IlmuKomputer.Com | Journal of Intelligent Systems

Artigo Revisado por pares

Quan Do, Muhammad Ali Moriyani, Chau Le, Tuyen Le,

... cost-weighted term frequency-inverse document frequency (CW-TF-IDF) method that incorporates the cost information of pay items into the traditional TF-IDF word embedding method to measure project similarity. The ... an SHA. The findings showed that the CW-TF-IDF method significantly improves project clustering performance compared to ... recent state-of-the-art method. The CW-TF-IDF method also showed its outperformance in project bundling ...

Tópico(s): Infrastructure Maintenance and Monitoring

2023 - American Society of Civil Engineers | Journal of Construction Engineering and Management

Artigo

Junkai Yi, Guang Yang, Jing Wan,

... in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term’s distribution inside ... This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of ... algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as ...

Tópico(s): Advanced Text Analysis Techniques

2016 - Institute of Information Science | Journal of information science and engineering

Artigo Revisado por pares

Qifeng Wan, Xuanhua Xu, Jing Han,

... scale group decision-making scenarios. This method combines TF-IDF feature similarity and information loss entropy to address ... capture the semantics of the text as a TF-IDF feature matrix, which facilitates subsequent calculations. Secondly, a cluster process integrating the TF-IDF feature similarity is operated to divide the large- ...

Tópico(s): Rough Sets and Fuzzy Logic

2023 - Elsevier BV | Applied Soft Computing