Artigo Acesso aberto Revisado por pares

Process metrics for software defect prediction in object‐oriented programs

2020; Institution of Engineering and Technology; Volume: 14; Issue: 3 Linguagem: Inglês

10.1049/iet-sen.2018.5439

ISSN

1751-8814

Autores

Qiao Yu, Shujuan Jiang, Junyan Qian, Lili Bo, Li Jiang, Gongjie Zhang,

Tópico(s)

Software System Performance and Reliability

Resumo

IET SoftwareVolume 14, Issue 3 p. 283-292 Research ArticleOpen Access Process metrics for software defect prediction in object-oriented programs Qiao Yu, Corresponding Author Qiao Yu yuqiao@jsnu.edu.cn School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, People's Republic of ChinaSearch for more papers by this authorShujuan Jiang, Shujuan Jiang School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, People's Republic of China Engineering Research Center of Mine Digitalization of Ministry of Education, Xuzhou, People's Republic of ChinaSearch for more papers by this authorJunyan Qian, Junyan Qian Guangxi Key Laboratory of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, People's Republic of China Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, People's Republic of ChinaSearch for more papers by this authorLili Bo, Lili Bo School of Information Engineering, Yangzhou University, Yangzhou, People's Republic of ChinaSearch for more papers by this authorLi Jiang, Li Jiang School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, People's Republic of China Engineering Research Center of Mine Digitalization of Ministry of Education, Xuzhou, People's Republic of ChinaSearch for more papers by this authorGongjie Zhang, Gongjie Zhang School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, People's Republic of ChinaSearch for more papers by this author Qiao Yu, Corresponding Author Qiao Yu yuqiao@jsnu.edu.cn School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, People's Republic of ChinaSearch for more papers by this authorShujuan Jiang, Shujuan Jiang School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, People's Republic of China Engineering Research Center of Mine Digitalization of Ministry of Education, Xuzhou, People's Republic of ChinaSearch for more papers by this authorJunyan Qian, Junyan Qian Guangxi Key Laboratory of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, People's Republic of China Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, People's Republic of ChinaSearch for more papers by this authorLili Bo, Lili Bo School of Information Engineering, Yangzhou University, Yangzhou, People's Republic of ChinaSearch for more papers by this authorLi Jiang, Li Jiang School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, People's Republic of China Engineering Research Center of Mine Digitalization of Ministry of Education, Xuzhou, People's Republic of ChinaSearch for more papers by this authorGongjie Zhang, Gongjie Zhang School of Computer Science and Technology, Jiangsu Normal University, Xuzhou, People's Republic of ChinaSearch for more papers by this author First published: 01 June 2020 https://doi.org/10.1049/iet-sen.2018.5439Citations: 3AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Software evolution is an important activity in the life cycle of a modern software system. In the process of software evolution, the repair of historical defects and the increasing demands may introduce new defects. Therefore, evolution-oriented defect prediction has attracted much attention of researchers in recent years. At present, some researchers have proposed the process metrics to describe the characteristics of software evolution. However, compared with the traditional software defect prediction methods, the research on evolution-oriented defect prediction is still inadequate. Based on the evolution data of object-oriented programs, this study presented two new process metrics from the defect rates of historical packages and the change degree of classes. To show the effectiveness of the proposed process metrics, the authors made comparisons with the code metrics and other process metrics. An empirical study was conducted on 33 versions of nine open-source projects. The results showed that adding the proposed process metrics could improve the performance of evolution-oriented defect prediction effectively. 1 Introduction Software evolution is an important activity in the life cycle of a modern software system, and it is a dynamic and continuous process [1]. During the long running of a software system, the maintainers need to make changes due to repaired defects, increased demand, and changed environment. All these activities are known as software evolution. In the process of software evolution, the maintainers may introduce new defects along with the repair of historical defects, and the increase of demands may also introduce new defects. Therefore, software evolution is an important way to produce new defects, and it is also very important for evolution-oriented defect prediction. At present, traditional software defect prediction methods are mainly based on the code metrics (CM), such as McCabe [2], Halstead [3], and object-oriented CK [4], which are also known as product metrics. These metrics can describe the static characteristics of software. In practice, evolution data may also be essential for software defect prediction. Therefore, researchers have proposed the process metrics (PM), such as code changes [5, 6], to improve the performance of software defect prediction. Generally, the distribution of defects tends to meet the rule of 80:20, which indicates that the majority of software defects (about 80%) are distributed in a small proportion of software modules (about 20%) [7]. For programs written in the object-oriented language (e.g. Java), it indicates that 80% of defects may be distributed in 20% of packages or classes. During the evolution of a package, the historical defects may be fixed in the previous version, but new defects may be produced in the current version. Especially for a package with a large number of classes, we cannot repair all defects during one update, and we cannot guarantee that it does not produce new defects along with the repair. Therefore, the defect rate of a package in the previous version may be related to that in the current version. Besides, many researchers have proposed code changes [5, 6] for software defect prediction. Similarly, the class changes may also indicate the defect changes during the evolution process. Based on the above analysis, this paper presented two new PM from the defect rates of historical packages and the change degree of classes for object-oriented programs. To show the validity of the proposed PM (PPM), we conducted experiments on 33 versions of nine open-source projects. The results showed that adding the PPM could improve the performance of defect prediction effectively. Moreover, the PPM performed better than the CM mentioned in [8] and the code churn metrics (CCM) defined in [9]. The main contributions of this paper are as follows: Two new process metrics are proposed for object-oriented programs, which are extracted from the defect rates of historical packages and the change degree of classes. An empirical study is conducted on 33 versions of nine open-source projects. The results show that the PPM perform better than the CM and the CCM. The remainder of this paper is organised as follows. Section 2 summarises the related work on evolution-oriented defect prediction. Section 3 describes the details of our approach, including the PPM and the process of evolution-oriented defect prediction. Section 4 gives an empirical study to show the validity of the PPM. Section 5 summarises the threats to validity. Section 6 draws conclusions and discusses future work. 2 Related work In recent years, software defect prediction has attracted much attention of researchers in software engineering [10-13]. Traditional software defect prediction methods are mainly based on the CM, which can only describe the static characteristics of software. However, software defects often change along with the evolution of software in practical applications. As a result, the prediction model built on the CM may not be ideal on the evolution versions [14]. Therefore, it is very important to extract the PM to improve the performance of evolution-oriented defect prediction. At present, many researchers have proposed evolution metrics. For example, from the perspective of code changes, Nagappan and Ball [5] applied the code churn to predict the defect density, and the experimental results showed that the relative code churn performed better than the absolute code churn in predicting the defect density. Moser et al. [15] compared change metrics and CM on the Eclipse project, and the results indicated that the change metrics performed better than the CM. Hassan [6] proposed the complexity metrics based on code changes, and they conducted experiments on six large open-source projects to show the validity of change complexity metrics. Kpodjedo et al. [16] proposed the design evolution metrics (DEM), including the numbers of added, deleted, and modified attributes, methods, and relations. The results showed that combining DEM with traditional metrics could improve the identification of defective classes, and DEM could predict more defects within a given size of code. Bhattacharya et al. [17] proposed the graph-based approach to capture the product metrics and PM. The results indicated that the graph metrics could be used to predict bug severity, maintenance effort, and defect-prone releases. Moreover, Stanić and Afzal [18] investigated different combinations of CM and PM, and the results indicated that the combination of PM and static CM tended to improve the prediction performance. Graves et al. [19] extracted the PM from software change history, which performed better than the product metrics. Rahman and Devanbu [20] conducted a large number of experiments to compare the performance of PM (such as code changes, developer/committer information etc.) and CM. The results indicated that PM performed better than CM. In addition, Madeyski and Jureczko [7] investigated the performance of PM, such as the number of revisions, distinct committers, modified lines, and defects in the previous version. Wang and Wang [21] presented two types of evolution metrics, including version and code levels, and the experimental results showed the validity of these evolution metrics. They also indicated that the evolution data in the recent version performed better than that in all historical versions. Liu et al. [22] used the historical version sequence of metrics to predict defects in continuous software versions. In view of the software change, the researchers have proposed the change-level defect prediction, which is known as just-in-time defect prediction [23, 24]. Unlike the traditional defect prediction methods, the change-level defect prediction methods regard a change as an instance for training and predict whether new software changes introduce changes or not. For example, Yuan et al. [25] presented a prediction approach based on the source code changes, which was designed in statement level. The experiments were conducted on six open-source projects, and the results indicated that the proposed source code changes approach performed better than that of the file and transaction changes approaches. Yang et al. [26] used deep learning to predict the defect-prone changes, and they used a deep belief network algorithm to build a set of expressive features from a set of initial change features. The experimental results showed that the proposed approach performed better than the approach proposed by Kamei et al. [27]. For cross-version defect prediction, Kastro and Bener [28] proposed a model to predict the number of defects in the new version with respect to the previous stable version. Xu et al. [29] proposed a two-phase framework that combined the hybrid active learning and kernel PCA to address the data differences between two versions. The experimental results indicated that the proposed framework outperformed the baseline methods. Yang and Wen [30] investigated the performance of ridge regression and lasso regression for cross-version defect prediction, and the results showed that both models performed better than the linear regression [31] and negative binomial regression [32] models. In this paper, we aim to extract new PM based on the evolution data in recent version, rather than that in all historical versions. For object-oriented programs, the classes of two neighbouring versions may be different due to the changes in the evolution process. According to the changes of classes between two neighbouring versions, we divide the classes into two types: common classes and new classes. The common classes have evolution data, but the new classes do not have evolution data. Therefore, we attempt to extract the PM based on the common classes of two neighbouring versions. 3 Our approach The framework of our approach is shown in Fig. 1. Based on the evolution versions of package (two neighbouring versions of package), we first use text matching approach to identify common classes where the package name and the class name of two neighbouring versions are exactly the same. Second, we extract PM according to the changes of these common classes. Finally, we add the extracted PM to build new datasets for training and test. As we extract new PM, we should build new datasets for training and test. Fig. 1Open in figure viewerPowerPoint Framework of our approach 3.1 Proposed process metrics Based on the evolution data of object-oriented programs, we presented two PM from the defect rates of historical packages and the change degree of classes. First of all, we give the basic definitions as follows. Definition 1.(Evolution period): For two neighbouring versions Vt−1 and Vt of a project, the change process from the previous version Vt−1 to the current version Vt is called an evolution period. The evolution period refers to the change process from Vt−1 to Vt, but not the time taken from Vt−1 to Vt. For example, there are three versions of a project as V1, V2, and V3. The change process from V1 to V2 can be called as an evolution period, and the change process from V2 to V3 can also be called as an evolution period. Definition 2.(Defect rate of a package): The percentage ratio of the number of defective classes to the number of all classes in a package. The calculation is shown in the following formula: (1) During an evolution period, the defects in Vt−1 may be fixed or may continue to exist in Vt. For a certain package in Vt−1 and Vt, the number of defective classes may be reduced or increased. For a package with a large number of classes, when the defect rate of this package is very high, the maintainers cannot be able to repair all defects in an evolution period, even they may introduce new defects. In this case, the defect rate of a package in the previous version may reveal the defect rate of the same package in the current version, as well as the probability that one class may be a defective class in the current version. Based on this, we present the first process metric. 3.1.1 pcdc – the probability that one class may be a defective class The defect rate of a package in the previous version is used to measure the defect rate of the same package in the current version and measure the probability that one class may be a defective class in the current version. It should be noted that the proposed pcdc metric is designed in class granularity, but not in package granularity. To better understand the calculation of pcdc, we take an example as illustrated in Fig. 2. Fig. 2Open in figure viewerPowerPoint Example to calculate pcdc As displayed in Fig. 2, there are three versions (V1, V2, and V3) and two packages (P1 and P2). V1 and V2 represent two historical versions, and V3 represents the current version. There are five classes (C11–C15) in package P1 and three classes (C21–C23) in package P2. In particular, '0' represents a non-defective class, and '1' represents a defective class. Especially, 'u' represents unknown, because we do not know the class label of the current version V3. We can see that there are four defective classes and one non-defective class in package P1 of V1, so the defect rate (the ratio of defective classes to all classes) is 4/5 = 0.8. So we mark the value of pcdc as 0.8 for V2. Moreover, there is no defective class in package P2 of V1, V2, and V3, so we mark the value of pcdc as 0.0 for V2 and V3. CM can describe the static characteristics of software, and each version can be represented by CM. When comparing the same classes of two neighbouring versions Vt−1 and Vt, we can get the changes of all CM. If a class changed on most of CM in an evolution period, it was considered that the change degree of this class was high. In other words, the evolution degree of this class was high. The higher evolution degree of a class indicated that the class might contain more defects. Based on this, we present the second process metric. 3.1.2 pccm – the percentage of changed CM Comparing the previous version Vt−1 with the current version Vt, pccm represents the ratio of the number of changed CM to the number of all CM. This metric can reflect the change degree of a class in an evolution period. Fig. 3 shows an example to calculate pccm. Suppose that there are ten CM (a1, a2, …, a10) in one class, and there are three versions as V1, V2, and V3. Comparing the values of these metrics between V1 and V2, there are six changed metrics (a1, a2, a3, a6, a7, a9). So, the value of pccm for V2 is 6/10 = 0.6. Similarly, comparing the values of metrics between V2 and V3, there are eight changed metrics (a1, a2, a3, a4, a5, a6, a7, a10). So, the value of pccm for V3 is 8/10 = 0.8. Fig. 3Open in figure viewerPowerPoint Example to calculate pccm Based on the PPM, we can build new datasets for evolution-oriented defect prediction. The details are described as follows. 3.2 Process of evolution-oriented defect prediction For two neighbouring versions Vt−1 and Vt, their classes may be different due to the changes in the evolution process. Thus, we need to process Vt−1 and Vt to get their common classes, then we can extract the PM. The three continuous versions of a project were considered and symbolically were represented as Vt−2, Vt−1, and Vt for showing the process of evolution-oriented defect prediction in Fig. 4. Furthermore, the three versions of a project contained the same CM. Fig. 4Open in figure viewerPowerPoint Process of evolution-oriented defect prediction The steps of evolution-oriented defect prediction are listed as follows: Step 1: Process Vt−2 and Vt−1 to get their common classes using text matching approach, marked as and . Step 2: Extract the PPM (pcdc and pccm) based on and . Step 3: Add the PPM (pcdc and pccm) to , then get . Step 4: Perform the same process for Vt−1 and Vt, then get . Step 5: is regarded as the training set to build the prediction model, and is regarded as the test set. Step 6: Get the prediction results. It should be noted that the classes in and may not be exactly the same. 4 Empirical study To show the validity of the PPM, we conducted an empirical study on nine open-source projects from the Tera-PROMISE repository. All experiments were conducted on Open JDK 1.8 [33] and Weka 3.8 [34]. 4.1 Experimental datasets We used 33 versions of nine open-source projects for experiments, which were commonly used in defect prediction [7, 21, 35, 36]. These projects were designed at the class level, and each project contained at least three versions. In particular, the category label of the original datasets was the number of defects, so we converted the number of defects into binary classification. That is, the class without defect was considered to be non-defective, and the class with one or more defects was considered to be defective. The details of these datasets are listed in Table 1 which shows the names of projects and their versions (columns 1–2). Then it shows the numbers of: (i) all samples, (ii) defective samples, and (iii) non-defective samples (columns 3–5). Finally, it shows the percentage defect rate (column 6). Table 1. Experimental datasets Project Version No. of all samples No. of defective samples No. of non-defective samples Defect rate, % Ant 1.3 125 20 105 16.0 1.4 178 40 138 22.5 1.5 293 32 261 10.9 1.6 351 92 259 26.2 1.7 745 166 579 22.3 Camel 1.0 339 13 326 3.8 1.2 608 216 392 35.5 1.4 872 145 727 16.6 1.6 965 188 777 19.5 Jedit 3.2 272 90 182 33.1 4.0 306 75 231 24.5 4.1 312 79 233 25.3 4.2 367 48 319 13.1 4.3 492 11 481 2.2 Log4j 1.0 135 34 101 25.2 1.1 109 37 72 33.9 1.2 205 189 16 92.2 Lucene 2.0 195 91 104 46.7 2.2 247 144 103 58.3 2.4 340 203 137 59.7 Poi 1.5 237 141 96 59.5 2.0 314 37 277 11.8 2.5 385 248 137 64.4 3.0 442 281 161 63.6 Synapse 1.0 157 16 141 10.2 1.1 222 60 162 27.0 1.2 256 86 170 33.6 Velocity 1.4 196 147 49 75.0 1.5 214 142 72 66.4 1.6 229 78 151 34.1 Xerces 1.2 440 71 369 16.1 1.3 453 69 384 15.2 1.4 588 437 151 74.3 For two neighbouring versions, their common classes have evolution data, but new classes do not have evolution data. In order to extract the PM, we need to process the datasets in Table 1 and get the common classes of two neighbouring versions. In our experiments, we used the text matching approach to identify the common classes between two neighbouring versions. That is to say, when the package name and the class name of two neighbouring versions were exactly the same, it was considered to be a common class. During the evolution of software, if one class changed its name or moves to another package due to software refactoring, it was regarded as a new class. In the datasets as shown in Table 1, there is no case that one class changes its name or moves to another package. Therefore, the text matching approach we used to identify the common classes will not affect the following experimental results. The processed datasets are shown in Table 2. The samples in one processed dataset are the common classes with its previous version. So, the third column (No. of all samples) in Table 2 indicates the number of common classes. We take Ant as an example to show how to get the processed datasets. The example is displayed in Fig. 5. Table 2. Processed datasets Project Version No. of all samples No. of defective samples No. of non-defective samples Defect rate, % Ant 1.4* 125 22 103 17.6 1.5* 168 23 145 13.7 1.6* 292 72 220 24.7 1.7* 350 116 234 33.1 Camel 1.2* 262 87 175 33.2 1.4* 569 117 452 20.6 1.6* 857 171 686 20.0 Jedit 4.0* 265 62 203 23.4 4.1* 291 77 214 26.5 4.2* 291 41 250 14.1 4.3* 224 5 219 2.2 Log4j 1.1* 98 33 65 33.7 1.2* 103 95 8 92.2 Lucene 2.2* 192 112 80 58.3 2.4* 235 158 77 67.2 Poi 2.0* 224 26 198 11.6 2.5* 314 219 95 69.7 3.0* 382 261 121 68.3 Synapse 1.1* 152 45 107 29.6 1.2* 219 73 146 33.3 Velocity 1.5* 155 84 71 54.2 1.6* 209 65 144 31.1 Xerces 1.3* 433 62 371 14.3 1.4* 328 209 119 63.7 Fig. 5Open in figure viewerPowerPoint Example to get the processed datasets As shown in Fig. 5, the number of samples in Ant-1.4* is the number of common classes of Ant-1.3 and Ant-1.4, and the number of samples in Ant-1.5* is the number of common classes of Ant-1.4 and Ant-1.5, and so on. For example, there are 125 common classes between the samples in Ant-1.3 and Ant-1.4 and 168 common classes between the samples in Ant-1.4 and Ant-1.5. Accordingly, we can get the processed datasets in Table 2 based on the datasets in Table 1. Each dataset in Table 2 contains 20 features, which can be represented by CM in the class level. It indicates that each sample in these datasets represents a class in one project. The details of these CM are shown in Table 3. In addition, we also used 20 CCM as PM defined in [9], which represent the absolute change of CM (as listed in Table 3) between two neighbouring versions. The PPM are shown in Table 4. Table 3. CM and descriptions CM Description CM Description wmc weighted methods per class loc lines of code dit depth of inheritance tree dam data access metric noc number of children moa measure of aggregation cbo coupling between object classes mfa measure of functional abstraction rfc response for a class cam cohesion among methods of class lcom lack of cohesion in methods ic inheritance coupling ca afferent couplings cbm coupling between methods ce efferent couplings amc average method complexity npm number of public methods max_cc maximum McCabe's cyclomatic complexity lcom3 lack of cohesion in methods, different from lcom avg_cc average McCabe's cyclomatic complexity Table 4. PPM and descriptions PPM Description pcdc the probability that one class may be a defective class pccm the percentage of changed CM PPM represents pcdc + pccm. 4.2 Experimental design To comprehensively show the validity of the PPM, we designed the following three research questions (RQs). RQ1: Based on the common classes of two neighbouring versions, whether there is a correlation between the defect rates of the same packages? The pcdc metric is designed to use the defect rate of a package in the previous version to measure the defect rate of the same package in the current version, thereby measuring the probability that one class may be a defective class in the current version. In practice, whether there is a correlation between the defect rates of packages for two neighbouring versions? To answer this question, we use the Pearson's r [37] to measure the correlation of the defect rate of a package between two neighbouring versions. RQ2: Whether the PPM (pcdc and pccm) could improve the performance of defect prediction? We conducted experiments on CM, CM + CCM, and CM + PPM. CM represents 20 CM in Table 3. CM + CCM represents 20 CM and 20 CCM. CM + PPM represents 20 CM and two PPM (pcdc and pccm). For two neighbouring and processed versions, we used the previous version as the training set, and the current version as the test set, then we compared the results of CM, CM + CCM, and CM + PPM. Moreover, we conducted feature selection on the training set by using the correlation-based feature selection (CFS) approach [38] and best-first search algorithm, then we also compared the results of CM, CM + CCM, and CM + PPM. RQ3: Which is the best among the three combinations of CM + PPM, CM + pcdc, and CM + pccm? In order to explore the independent predictive ability of pcdc and pccm, we further make comparisons of CM + PPM, CM + pcdc, and CM + pccm. In our experiments, we used K-nearest neighbours (KNN) [39], logistic regression (LR) [40], and naive Bayes (NB) [41] as the prediction models. The reason is that they do not have the built-in feature selection approach [42], and the performance of these models is more stable with the class imbalance problem [43]. These models are implemented in Weka. For the KNN model, the parameter 'K' is set to '10', and the parameter 'distanceWeighting' is set to '1/distance'. We use the default parameters of LR and NB models. Actually, there are many prediction models, and we only select KNN, LR, and NB for experiments, which are commonly used in empirical studies [42, 44, 45]. For each project with several versions, the former version is regarded as the training set, and the next version is regarded as the test set. We use AUC [46] as the performance metric, which represents the area under the receiver operating characteristic curve. The value of AUC is between 0 and 1. The larger AUC indicates that the performance of one model is better. Jiang et al. [47] proved that AUC was more accurate and reliable than other metrics. Moreover, AUC has been widely used in empirical studies of software defect prediction [48-54]. 4.3 Experimental results and analysis 4.3.1 Correlation of the defect rate between two neighbouring versions The pcdc metric has been proposed to measure the defect rate of the package in the current version using the defect rate of a package in the previous version. Here, the defect rate has measured the probability that one class might be a defective class in the current version. In other words, this metric indicates that the defect rates of the same packages may be similar between two neighbouring versions. In order to show its rationality, we used the Pearson's r to measure the correlation of the defect rate of a package between two neighbouring versions. The value of r is between −1 and 1. The larger |r| indicates that the correlation between two neighbouring versions is higher. Table 5 shows the number of common packages and Pearson's r between two neighbouring versions. Table 5. Correlation of the defect rate between two neighbouring versions Project Version No. of common packages Pearson's r Ant 1.3 versus 1.4 8 0.251 1.4 versus 1.5 10 −0.047 1.5 versus 1.6 21 0.012 1.6 versus 1.7 24 0.151 Camel 1.0 versus 1.2 36 0.432 1.2 versus 1.4 70 0.743 1.4 versus 1.6 108 0.493 Jedit 3.

Referência(s)
Altmetric
PlumX