Keyword Extraction from Scientific Research Projects Based on SRP‐TF‐IDF
2021; Institution of Engineering and Technology; Volume: 30; Issue: 4 Linguagem: Inglês
10.1049/cje.2021.05.007
ISSN2075-5597
AutoresZhuohao Wang, Dong Wang, Qing Li,
Tópico(s)Web Data Mining and Analysis
ResumoKeyword extraction by Term frequency-Inverse document frequency (TF-IDF) is used for text information retrieval and mining in many domains, such as news text, social contact text, and medical text. However, keyword extraction in special domains still needs to be improved and optimized, particularly in the scientific research field. The traditional TF-IDF algorithm considers only the word frequency in documents, but not the domain characteristics. Therefore, we propose the Scientific research project TF-IDF (SRP-TF-IDF) model, which combines TF-IDF with a weight balance algorithm designed to recalculate candidate keywords. We have implemented the SRP-TF-IDF model and verified that our method has better precision, recall, and F1 score than the traditional TF-IDF and TextRank methods. In addition, we investigated the parameter of our weight balance algorithm to find an optimal value for keyword extraction from scientific research projects.
Referência(s)