Categorisation‐based approach for predicting the fault‐proneness of object‐oriented classes in software post‐releases

Artigo Acesso aberto Revisado por pares

Categorisation‐based approach for predicting the fault‐proneness of object‐oriented classes in software post‐releases

2020; Institution of Engineering and Technology; Volume: 14; Issue: 5 Linguagem: Inglês

10.1049/iet-sen.2019.0326

ISSN

1751-8814

Autores

Jehad Al Dallal,

Tópico(s)

Software Engineering Techniques and Practices

Resumo

IET SoftwareVolume 14, Issue 5 p. 525-534 Research ArticleOpen Access Categorisation-based approach for predicting the fault-proneness of object-oriented classes in software post-releases Jehad Al Dallal, Corresponding Author Jehad Al Dallal j.aldallal@ku.edu.kw Department of Information Science, Kuwait University, P.O. Box 5969, Safat, 13060 KuwaitSearch for more papers by this author Jehad Al Dallal, Corresponding Author Jehad Al Dallal j.aldallal@ku.edu.kw Department of Information Science, Kuwait University, P.O. Box 5969, Safat, 13060 KuwaitSearch for more papers by this author First published: 01 October 2020 https://doi.org/10.1049/iet-sen.2019.0326AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Subsequent releases of a system have common development environments and characteristics. However, prediction models based on within-project data potentially suffer from being based on fault data reported within relatively short maintenance time intervals, which potentially decreases their prediction abilities. In this study, the authors propose an approach that improves the classification performance of models based on within-project data that are applied to predict the fault-proneness of the classes in a software post-release (PR). The proposed approach involves selecting a set of immediate pre-releases and constructing a prediction model based on each pre-release. The PR classes are categorised based on whether they are newly developed or they are reused, with or without modification, from one or more of the selected pre-releases. The prediction models are applied to the PR classes reused from selected pre-releases, and the results are used to construct a fault-proneness prediction model. After applying this prediction model to all PR classes, the fault-proneness results are adjusted by considering the relationship between the prediction results of the individual pre-release models and the actual fault data. They reported an empirical study that shows that the classification performance of the categorisation-based fault-proneness prediction models is considerably better than those constructed using existing approaches. 1 Introduction Software engineers aim to develop high-quality applications. They consider several external and internal quality attributes [1 ] to evaluate the quality of the software system of interest. Software artefacts, such as the source code, are considered when measuring internal quality attributes, such as complexity, coupling, and cohesion. Software practitioners and developers use internal quality attributes to estimate external quality attributes such as fault-proneness, testability, and maintainability [2 ]. The external quality attributes depend significantly on software environment factors, such as the software domain and the expertise of software developers. Overviews of internal and external quality attributes are presented in ISO/IEC 25010:2011 [3 ]. To estimate the fault-proneness of classes in an object-oriented system (i.e. the likelihood that a class contains faults), a prediction model is built by applying a statistical technique, such as logistic regression, to a set of data referenced as a training set. The training set consists of values of internal quality measures and corresponding fault data for a set of classes in one or more existing systems with records of maintenance history. When developing a new system (or a new PR of a system), the prediction model is applied to the classes of the new system to estimate their fault-proneness. The data set of classes on which the prediction model is applied is called the application set. Classes estimated to be faulty must be carefully verified and validated before releasing the system of interest. Researchers have considered two main approaches for building prediction models: the within-project approach and the cross-projects approach. In the former, the training set consists of data of a pre-release of the release under consideration (i.e. the release for which the fault-proneness of its classes are to be predicted). In the latter approach, the training set consists of data of systems other than the system under consideration. The training and application sets in the within-project approach are more likely to share the same software environment than those in the cross-projects approach, which is expected to have a positive impact on the model's prediction performance. However, the time interval between the releases of the systems considered in the training and application sets in the within-project approach are likely to be smaller than those in the cross-projects approach. Prediction models based on training sets with a relatively short maintenance history potentially have degraded prediction abilities [4 ]. In this paper, we propose a categorisation-based approach that aims at improving the classification performance of prediction models based on within-project data. The approach provides a methodology to construct a statistical-based prediction model for a post-release (PR). The prediction model considers the prediction results (i.e. faultiness probabilities) of the classes in several pre-releases and it is applied to all classes in the PR. Some of the resulting prediction results are adjusted by applying a rule that considers the relationship, for the pre-release classes, between the faultiness probabilities and the actual fault data. The proposed approach has two key advantages over relevant existing approaches. First, it combines the results of prediction models of several pre-releases and shows how to consider them all at once. Second, it considers the faultiness history of the classes reused from previous releases and uses them to adjust the prediction results. We believe that these two advantages potentially cause the proposed approach to produce prediction results better than relevant existing approaches. We empirically evaluate the proposed categorisation-based approach by applying it to three open-source systems. For each system, we considered five releases. We applied the prediction models based on the cross-project, conventional within-project, and categorisation-based approaches to each of the selected systems, obtained the classification performances, and compared them. The empirical study addresses the following key research questions: RQ1: What is the ability of a prediction model constructed using the proposed categorisation-based approach to correctly classify the classes into faulty and non-faulty classes? RQ2: What is the impact of the number of considered pre-releases on the classification performance of the prediction model constructed using the proposed categorisation-based approach? RQ3: Are the classification performance results of applying the prediction model constructed using the proposed categorisation-based approach to predict faulty classes in a PR of a system better than those that are based on the two existing conventional approaches? The main contributions of this paper are as follows: We propose a novel approach for predicting faulty classes in the PR of an object-oriented system. We perform an empirical exploration of the ability of the proposed approach to predict faulty classes in the PR of a software system. We perform an empirical investigation of the impact of the number of selected pre-releases on the ability of the prediction model constructed using the proposed approach to predict faulty classes in the PR of a software system. We empirically compare the ability of the proposed approach to predict the fault-proneness of PR classes to those of prediction models based on the existing conventional approaches. This paper is structured as follows. In Section 2, we review and discuss related work. A background overview of the logistic regression analysis technique is provided in Section 3. In Section 4, we introduce and explain the proposed categorisation-based approach. The considered software systems and the data collection process considered in the empirical study are reviewed in Section 5. Sections 6 and 7 report and discuss the empirical study results, and the validity threats to the empirical study are discussed in Section 8. The conclusions and possible future work are presented and discussed in Section 9. 2 Related work Researchers have identified several internal quality attributes of object-oriented classes, such as cohesion, coupling, and complexity, and have proposed many measures to assess these quality attributes [1, 5 ]. To empirically explore the relationship between the internal and external quality attributes, researchers have applied several statistical and machine learning techniques to construct prediction models [1 ]. For example, in [6 ], Al Dallal reported an empirical study for the ability of models constructed using several reusability measures to predict class fault-proneness. Typically, the classification performance of a prediction model is explored using several evaluation measures. Several studies have empirically explored the prediction of the fault-proneness of object-oriented classes in a PR as follows. To construct the PR fault-proneness prediction models, some studies followed the cross-projects approach (e.g. [7-12 ]) and some others followed the within-project approach (e.g. [13-18 ]). Kitchenham et al. [19 ] performed a systematic literature review for the studies that empirically compare prediction models based on within-company and cross-company data. They reported that the results are inconclusive, but the trend for studies considering small within-company data sets is that prediction models based on within-company data are better than those based on cross-company data. Table 1 summarises related work focused on applying within-project approach. None of the existing studies suggested an approach to combine the prediction results of several within-project data models and make use of the relationship between the available prediction results and actual fault data. Table 1. Summary of the relevant studies Study Measures Systems Remarks Olague et al. [14 ] 3 code-based suites including 18 measures 6 Mozilla Rhino releases For each release, the authors constructed prediction models based on the measures considered individually and in combination. Each prediction model that was based on combinations of measures for a release was validated using the data of the subsequent release. The accuracy of the prediction models based on release n in correctly classifying the classes in release n + 1 ranged widely from 62.6 to 90.6%. This result indicated that in the context of agile software development, prediction models based on within-project data perform well when they are applied to PRs. Shatnawi and Li [15 ] 12 complexity measures 3 Eclipse releases Based on their severity, faults were classified into three categories: high, medium, and low. The ability of prediction models based on individual measures to predict each severity level was investigated. In addition, the authors constructed a prediction model based on combinations of measures for each considered release and validated it using the subsequent releases. The results indicated that the ability of the models to predict error-prone classes in subsequent release decreases as the system evolves. Zhou et al. [16 ] 10 complexity measures 3 Eclipse releases The study explored the abilities of models based on the measures, considered individually, to predict pre-release and PR faults. The models based on data of a certain release were validated by applying them to the classes of the subsequent release. Choudhary et al. [17 ] Code-based measures and two sets of software change measures 3 Eclipse releases The study applied three different machine learning techniques to build fault-proneness prediction techniques. The study compared the models that consider code-based measures to those that consider software change measures and found that the later ones have better prediction abilities. Rhmann et al. [18 ] Software change measures 4 android releases The study compared the classification performance of prediction models constructed using three machine learning techniques and two fuzzy-based algorithms. The study considered within project data approach for building the models and evaluating their performances. It was found that the prediction model based on Logitboost, which is a fuzzy-based algorithm, featured the best fault prediction. Rahman et al. [10 ] 8 process measures 38 releases of 9 systems The measures are collected at system level. The study compared the classification performance of the models based on cross-project data to those based on within-project data and found that the latter models have better classification performance. Turhan et al. [20 ] 17 code-based measures 10 systems The study empirically investigated the fault-proneness prediction abilities of models based on a mixture of cross-project and within-project data. To predict the fault-proneness of classes in a PR of a system, a prediction model was constructed using the data of several preceding pre-releases and the data of other systems. The model was then applied to classes of the considered PR. The study found no guaranteed practical benefit of adding cross-project data to the training set that is based on within-project data. In contrast, it is found that adding only 10% of the available within-project data to the training set that was originally based on solely cross-project data was sufficient to improve the prediction ability of the model. Al Dallal [13 ] 6 code-based measures (CK measures) 12 releases of 3 systems The study empirically compared the abilities of fault-proneness prediction models based on cross-project data to those based on within-project data when the models are applied to PR classes that are reused with or without modification from a previous release. The results provided evidence that when predicting the fault-proneness of PR reused classes, the models based on within-project data can provide significantly better classification performance than those based on cross-project data. It is important to note that the approaches proposed in this paper and in [13 ] are completely different. The approach proposed in [13 ] is limited to predicting fault-proneness of the PR classes that are reused from a previous pre-release. Newly developed PR classes (i.e. classes not reused from previous pre-releases) are ignored. In this paper, the proposed approach considers all classes in the PR. In addition, the approach considered in [13 ] constructs a single prediction model based on a single selected pre-release, whereas the approach proposed in this paper considers prediction models based on several pre-releases and explains how to incorporate the actual fault data of existing pre-releases with the initial prediction results to obtain the final prediction results. Multiple pre-releases are considered in [13 ], where each of them is considered alone to construct a prediction model and the performance of these models are compared to determine which model is the best. In this paper, the proposed approach combines the results of the pre-release prediction models and actual data of previously detected faults to predict the fault-proneness of classes in a PR. The empirical study reported in [13 ] considered six quality measures when building the prediction models, whereas the prediction models constructed in the empirical studies reported in this paper considered 18 measures. 3 Logistic regression analysis Logistic regression analysis [21 ] is a statistical technique that has been widely used to build prediction models for fault-proneness (e.g. [22-27 ]) and other external quality attributes (e.g. [28, 29 ]), and therefore, we apply this technique in our approach whenever a prediction model is required to be built. This section provides an overview of this statistical technique. Comparing the results across different model construction techniques (e.g. [30-33 ]) is outside the scope of this paper; therefore, we only considered the logistic regression technique when building the required prediction models. In logistic regression, one or more independent variables are used to predict a dependent variable of a discrete binary value. The analysis is considered univariate if it involves a single independent variable, and it is considered to be multivariate if it involves two or more independent variables. In the context of fault-proneness prediction, the typical independent variables are the internal quality measures, and the typical dependent variable is the variable that presents the existence of a detected fault for the considered classes. The dependent variable has a value of '1' if one or more faults were detected within a considered period of time; otherwise, it has a value of '0'. In the context of fault-proneness prediction, applying logistic regression analysis results in obtaining the following formula that provides the probability π that a class is faulty: (1) where Xi s are the independent variables, and the coefficients Ci are estimated using logistic regression analysis [21 ]. The set of classes whose quality and fault data are used to build the prediction model is called the training set, and the set of classes to which the prediction model is applied is called the application set. The formula resulting from the application of the logistic regression analysis to the training set is used to estimate the probability of faultiness for each class in the application set. Classes with high estimated faultiness probabilities must be carefully tested. A threshold t must be set to determine whether a class in the application set is predicted to be faulty or non-faulty. A class whose probability of faultiness is higher than t is estimated to be faulty; otherwise, it is estimated to be non-faulty. The selection of the threshold depends highly on the time and resources available for testing. Selecting a relatively high value of t potentially decreases the number of classes estimated to be faulty and therefore increases the chance of incorrectly classifying faulty classes as non-faulty. In contrast, choosing a relatively low value of t potentially increases the number of classes estimated to be faulty and therefore increases the chance of incorrectly classifying non-faulty classes as faulty. When classifying the classes as estimated faulty and non-faulty, the values of the confusion matrix (i.e. TN: True Negatives, FN: False Negatives, FP: False Positives, and TP: True positives) are obtained. To evaluate the classification performance of the constructed prediction model when it is applied to the application set, we considered Recall, Precision, Fmeasure, and MCC [34 ] evaluation criteria. 4 Fault prediction categorisation-based approach Existing relevant within-project data approaches are based on selecting a single pre-release, building a prediction model based on the pre-release data, and applying the prediction model on a PR to predict the fault proneness of its classes. Some researchers compared the prediction results of several prediction models, where each of these models is constructed using data of a single pre-release, and commented on which of these models is the best and why. None of these approaches tried to use the data of multiple pre-releases and incorporate this data altogether in a single prediction model for a PR. In addition, when constructing a prediction model using a pre-release data, none of the existing approaches studied the relationship between the prediction faultiness results and actual faultiness data and attempted to use this relationship to adjust the results of applying the prediction model on the PR classes to improve the prediction results of the constructed model. The proposed fault prediction categorisation-based approach overcomes the limitations of the existing approaches by providing a way to incorporate the prediction results of several pre-releases into a single prediction model for the PR classes and adjusting the prediction results based on the relationship between the pre-releases actual faultiness data and the prediction results. The proposed fault prediction approach is shown in Fig. 1. The approach starts by selecting n pre-releases (i.e. r 1, r 2, …, rn ) of the PR, which is the release for which the fault-proneness of its classes is to be predicted. At the time at which the PR is to be developed, the values of a set of quality measures (independent variables) and the existing fault data of the classes in each pre-release (dependent variable) are used as training data to construct a prediction model using logistic regression analysis. This process, marked as Step 1 in Fig. 1, builds n prediction models. Each prediction model pm is associated with a threshold value hpm that is set to be the average of the fault data for the classes used to build the prediction model. When the prediction model pm is applied to a class, the class is estimated to be faulty (i.e. the estimation value is set to 1) if the resulting probability of faultiness value is greater than hpm ; otherwise, it is estimated to be not faulty (i.e. the estimation value is set to 0). It is important to note that the selection of the threshold value hpm is based on information related to the training set that is available from the field and therefore is potentially better than the arbitrary default value of 0.5 [35 ]. Fig. 1Open in figure viewerPowerPoint Categorisation-based PR fault-proneness prediction approach Once the PR has been developed, in Step 2 shown in Fig. 1, we apply each of the n prediction models to each of the classes in the PR to predict their fault-proneness. This results in obtaining n probability of faultiness values (i.e. values in the range [0, 1]) and n corresponding faultiness estimations (i.e. values of either 0 or 1) for each class in the PR. Based on this application, each class that exists in both a pre-release p and the PR will have a pair of values (ep, ap ), where ep is the fault estimation value resulting from applying the prediction model based on pre-release p, and ap is the actual fault data considered in building the prediction model based on pre-release p. We call this pair the 'fault-mapping pair'. For example, if the result of applying the prediction model based on pre-release p to a PR class is 0 (i.e. the class is estimated to be not faulty), and a fault has been detected for this class since pre-release p was issued and before the PR is developed, the fault-mapping pair will have the values (0, 1). If the class exists in n pre-releases, it will have n fault-mapping pairs. Accordingly, each class in the PR has between zero and n fault-mapping pairs. The classes with no fault-mapping pairs are those newly developed in the PR. In Step 3 of the approach shown in Fig. 1, we classify the PR classes into two groups g 1 and g 2, where g 1 includes the classes that are reused, with or without modification, from one or more of the considered pre-releases, and g 2 includes the rest of the PR classes. For each class c in group g 1, we set the value of the fault data to '1' if the fault data associated with the class equals '1' for any of the considered pre-releases; otherwise, the value is set to '0'. That is, the value of the fault data represents whether the class had a detected fault since the issuing date of the oldest considered pre-release in which the class exists and until the PR development date. As a result, for each class in group g 1, we have n probability of faultiness values and a single fault data with a binary value. In Step 4, we consider the probabilities of faultiness, which are generated in Step 2 and filtered in Step 3, as the independent variables and the fault data for the PR reused classes as the dependent variable, and we build a prediction model using multivariate logistic regression analysis. The threshold tc for the faultiness estimation for this model is set to be the average of the dependent variable values (i.e. the fault data considered in the model construction). In Step 5, we apply this prediction model to all classes in the PR and obtain the 'preliminary' probability of faultiness values. We use the threshold tc associated with the model to classify the classes as 'preliminarily' estimated to be faulty or not faulty. The classes in group g 2 do not have any maintenance history; therefore, we consider their preliminary estimation to be final. Finally, in Step 6, for each class c in g 1, we adjust the preliminary fault-proneness estimation by considering the values in the corresponding fault-mapping pairs according to the following rules: if the value of the preliminary estimation is '0' if (0, 1) is one of the class's fault-mapping pairs, then adjust the estimation value to '1'; otherwise, keep the estimation value as is. if the value of the preliminary estimation is '1' for all pairs of a form (1, x ), if x is always equal to 0, then adjust the estimation value to '0'; otherwise, keep the estimation value as is. This rule changes the preliminary fault-proneness estimation value in two cases. The first case occurs when the value of the preliminary estimation is '0', and the application of at least one of the prediction models of the pre-releases was found to have incorrectly estimated the class to be non-faulty. In this case, to decrease the chances of not carefully testing a class that is potentially faulty, we set the faultiness estimation value to '1'. The second case occurs when the value of the preliminary estimation is '1', and the application of all prediction models of the pre-releases that result in estimating the class to be faulty was found to be incorrect. In this case, the estimation value that resulted from applying the prediction model of the PR is also potentially incorrect; therefore, we adjust the estimation value to '0'. In practise, internal quality measure values for each pre-release can be obtained once the pre-release is developed. The fault data for the classes in the pre-releases can be accumulated over time since each of the pre-releases is developed until the point of time at which the PR is developed. The filtration of the faultiness results for PR reused classes can be performed once the PR classes are determined during the design phase. At this point in time, the PR prediction model can be constructed. The only steps to be performed once the PR is developed include (1) obtaining the internal quality measure values of PR classes, (2) applying the PR's constructed prediction model, and (3) adjusting the resulting preliminary estimated fault data. 5 Data sets used in the empirical study We performed two empirical studies to evaluate the performance of the proposed approach and compare it with those of conventional approaches. The empirical studies consider five consecutive releases of three systems. For each system, we constructed prediction models to predict the fault-proneness of PR classes using the conventional within-project and cross-project approaches and the proposed categorisation-based approach. The selected systems and releases and the fault data collection process are described in this section. 5.1 Software systems and releases We selected three open-source Java systems from http://sourceforge.net. The descriptions of the selected PRs of these systems are provided in Table 2. Each of the selected systems is implemented in Java, has available fault data and source code, and has multiple pre-releases. The systems have different sizes and are from different domains. In addition, they are old enough to ensure that the fault data reported in the fault repositories is representative. Table 2. Descriptions of the selected PRs System Release Domain Date of release No. of classes Eclipse Link [36 ] 1.1.1 Multi-language software development environment May 15, 2009 1839 JHotDraw [37 ] 7.6 Java graphics framework for structured drawing editors January 9, 2011 904 JGroup [38 ] 2.7.0 Reliable group communication system January 5, 2009 671 In addition to the selected PR, for each system, we selected the immediately preceding four pre-releases. Table 3 provides descriptions of the considered pre-releases for each system. For each pre-release, the table includes the pre-release identifier, the pre-release release date, the number of classes included in the pre-release, and the number of months that elapsed between the release dates of the pre-release and the PR. The fifth column reports the percentage of pre-release classes detected as faulty on the date the PR was issued. These data are related to the fault data collection process described in Section 5.3. The last column reports the percentage of PR classes reused, with or without modification, from the pre-release. For example, 89.1% of classes in Eclipse 1.1.1 are reused, with or without modification, from Eclipse 1.0.0. The selected pre-releases of the different systems vary in terms of the duration between releases of the pre-release and the PR. For Eclipse, this duration ranges from 2 to 10 months, which is similar to that of the JGroup releases (i.e. 3–10 months). The durations between the

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Categorisation‐based approach for predicting the fault‐proneness of object‐oriented classes in software post‐releases