Artigo Acesso aberto Revisado por pares

Keyword Extraction from Scientific Research Projects Based on SRP‐TF‐IDF

2021; Institution of Engineering and Technology; Volume: 30; Issue: 4 Linguagem: Inglês

10.1049/cje.2021.05.007

ISSN

2075-5597

Autores

Zhuohao Wang, Dong Wang, Qing Li,

Tópico(s)

Web Data Mining and Analysis

Resumo

Keyword extraction by Term frequency-Inverse document frequency (TF-IDF) is used for text information retrieval and mining in many domains, such as news text, social contact text, and medical text. However, keyword extraction in special domains still needs to be improved and optimized, particularly in the scientific research field. The traditional TF-IDF algorithm considers only the word frequency in documents, but not the domain characteristics. Therefore, we propose the Scientific research project TF-IDF (SRP-TF-IDF) model, which combines TF-IDF with a weight balance algorithm designed to recalculate candidate keywords. We have implemented the SRP-TF-IDF model and verified that our method has better precision, recall, and F1 score than the traditional TF-IDF and TextRank methods. In addition, we investigated the parameter of our weight balance algorithm to find an optimal value for keyword extraction from scientific research projects.

Referência(s)
Altmetric
PlumX