Artigo Revisado por pares

A component recommender for bug reports using Discriminative Probability Latent Semantic Analysis

2016; Elsevier BV; Volume: 73; Linguagem: Inglês

10.1016/j.infsof.2016.01.005

ISSN

1873-6025

Autores

Meng Yan, Xiaohong Zhang, Dan Yang, Ling Xu, Jeffrey D. Kymer,

Tópico(s)

Web Application Security Vulnerabilities

Resumo

The component field in a bug report provides important location information required by developers during bug fixes. Research has shown that incorrect component assignment for a bug report often causes problems and delays in bug fixes. A topic model technique, Latent Dirichlet Allocation (LDA), has been developed to create a component recommender for bug reports. We seek to investigate a better way to use topic modeling in creating a component recommender. This paper presents a component recommender by using the proposed Discriminative Probability Latent Semantic Analysis (DPLSA) model and Jensen–Shannon divergence (DPLSA-JS). The proposed DPLSA model provides a novel method to initialize the word distributions for different topics. It uses the past assigned bug reports from the same component in the model training step. This results in a correlation between the learned topics and the components. We evaluate the proposed approach over five open source projects, Mylyn, Gcc, Platform, Bugzilla and Firefox. The results show that the proposed approach on average outperforms the LDA-KL method by 30.08%, 19.60% and 14.13% for recall @1, recall @3 and recall @5, outperforms the LDA-SVM method by 31.56%, 17.80% and 8.78% for recall @1, recall @3 and recall @5, respectively. Our method discovers that using comments in the DPLSA-JS recommender does not always make a contribution to the performance. The vocabulary size does matter in DPLSA-JS. Different projects need to adaptively set the vocabulary size according to an experimental method. In addition, the correspondence between the learned topics and components in DPLSA increases the discriminative power of the topics which is useful for the recommendation task.

Referência(s)