Approach to mine influential functions based on software execution sequence

Artigo Revisado por pares

Approach to mine influential functions based on software execution sequence

2017; Institution of Engineering and Technology; Volume: 11; Issue: 2 Linguagem: Inglês

10.1049/iet-sen.2016.0081

ISSN

1751-8814

Autores

Bing Zhang, Guoyan Huang, Haitao He, Jiadong Ren,

Tópico(s)

Advanced Malware Detection Techniques

Resumo

IET SoftwareVolume 11, Issue 2 p. 48-54 Research ArticleOpen Access Approach to mine influential functions based on software execution sequence Bing Zhang, Bing Zhang College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this authorGuoyan Huang, Corresponding Author Guoyan Huang corresponding.hgy@ysu.edu.cn College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this authorHaitao He, Haitao He College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this authorJiadong Ren, Jiadong Ren College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this author Bing Zhang, Bing Zhang College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this authorGuoyan Huang, Corresponding Author Guoyan Huang corresponding.hgy@ysu.edu.cn College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this authorHaitao He, Haitao He College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this authorJiadong Ren, Jiadong Ren College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 Hebei, People's Republic of China The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004 Hebei, People's Republic of ChinaSearch for more papers by this author First published: 01 April 2017 https://doi.org/10.1049/iet-sen.2016.0081Citations: 3AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract In software system, there are some functions of great importance in controlling the whole process of software execution. When they are damaged, the software will suffer from catastrophic consequences caused by cascading failures. To accurately identify and protect these influential functions has become a necessary method in software security. Thus, in this study a new approach to efficiently mine influential functions based on software execution sequence is proposed. First, the authors design a novel modelling strategy by which software execution traces are modelled as sequential patterns. Owing to loops, patterns can occur multiple times in a trace, which leads to high cost of time and extreme complexity of the research. Then, an algorithm is designed to remove repetitive patterns in software and software influential nodes mining algorithm is put forward to mine influential functions in software and to rank them by the rank-index. Finally, by comparatively analysing the top-ten functions got from PageRank and those from Degree-Based algorithm, the approach is proved to be an effective and accurate one which combines advantages of the two classic algorithms. 1 Introduction Which countries or regions are vital to the healthy development of the global economic system? What immunisation and vaccination strategies should we take to the large-scale outbreak of infectious diseases? Does the way of promoting and marketing new products on micro-blog by hiring influential bloggers at high prices? If so, where to find those bloggers? Moreover, why the few blown high-voltage lines in Cleveland, Ohio, caused the widespread power outage in North America and the loss of billion dollars? In short, the widespread power outage is a matter of ‘the vulnerability of successive network attacks’. If we could know something of the power network structure in advance, find out its key nodes and take necessary precautions, such huge economic loss might be avoided. Therefore, the core problem is how to identify those key nodes. Generally speaking, though limited number, key nodes can exert quick influence on a large quantity of other nodes in the network [1]. For instance, if attacked under a deliberate attack on a scale-free network, the small amount of key nodes will cause the collapse of the whole network [2, 3]; the moments of a small number of the most influential bloggers will soon spread to the whole network [4]; moreover, only 1% of companies control 40% of the world economy [5]. It can be seen that key nodes are of great importance to the structure and function of network. Given that the function of software system is closely related to its quality, research on mining influential functions in software system is thus of high application significance in fields such as software testing, software maintenance and software security. In other words, there is favourable application foreground for studies on the mining methods and key technologies of influential functions in software: In theory, the identifying of influential nodes (IN) in software system plays a great role in the guidance of software design and development as well as the improvement of its quality. For instance, software reliability can be promoted by giving special protections on influential functions in software. In practical application, studies on influential functions can not only help cut down the workload of software testing, but also help enhance the accuracy of software maintenance, which can greatly reduce the maintenance costs. From the aspect of direct financial impact, to improve software maintenance and development is an involved and costly task. According to Gartner, global software expenditures for 2010 amounted to $229 billion, with large vendors such as Microsoft and IBM reporting multi-billion dollar costs for software development each year. Most of the development cost – an estimated 50–90% of the total costs – is due to software maintenance. Despite the high costs, software is notoriously unreliable, and software bugs can wreak havoc on software producers and consumers alike – a National Institute of Standards and Technology (NIST) survey estimated the annual cost of software bugs to be about $59.5 billion [6]. Thus, knowledge on improving the maintenance by effectively identifying which components to debug, test or refactor motivated us to design a method to mine IN in software system. Different from the studies on IN in software complex network, the mining methods and evaluation indexes of IN based on sequence data are very few at present, this paper thus analyses nodes in software execution sequences from the new perspective of sequential patterns, and designs an algorithm for the mining of influential functions in software of stability and high efficiency in the model. In accordance with the characteristics of software itself, software execution sequence is first modelled as a sequence. Then, the algorithm of removing repetitive patterns (RRPs) in software execution sequence is put forward, which not only cuts down the time, but also makes software dynamic structure clearer. Accordingly, by use of software IN mining (SINM) algorithm, software functions are ranked by rank-index. By analysing the top-ten functions obtained from PageRank algorithm and those from Degree-Based algorithm, the approach we propose is finally proven to be a better one. The rest of this paper is organised as follows. Section 2 gives the detail related work. Section 3 describes the definition of the model and presents its algorithm. Section 4 proposes pattern sequence generating (PSG) algorithm and RRP algorithm. The SINM approach is given in Section 5. Experiments on three open-source software are conducted in Section 6. Moreover, the results are discussed in Section 7. Thus, the conclusion of this paper is shown in Section 8. 2 Related work Current studies on influential functions in software are mostly aimed at identifying evaluation indexes and mining methods of IN based on complex software network. Indexes such as node degree [7], betweenness centrality [8] and eigenvector [9] are typical examples which are identified on the basis of node properties and network topologies, while methods such as PageRank [10], LearderRank [11] and HITS [12] are representatives of node mining methods based on random nodes. Such indexes and mining methods, however, are much more focused on unweighted and undirected complex network, which have limitations and drawbacks to certain extent. In most cases, they tend to take the consequences of a single factor into account and to ignore the connections among different factors, which may cause information loss of IN and decrease of identification accuracy. In other words, both of them cannot be applied universally due to the inconsistence in practise between them and software network. Evaluation on IN is not only determined by their own characteristics, but also by the impact of their locations and the attributes of their adjacent nodes. Though the network of dynamic execution process generally describes the possibilities of interactions among the different inner components of software, it is unable to describe the complete characteristics of software execution. Unlikely, software execution sequence can define the actual interactions among different components under certain operation conditions. Therefore, analysis of the mining based on software execution sequence can help modify the deficiencies of studies on the network of software execution process. With regard to researches about node mining based on sequential patterns, mining methods and relevant evaluation indexes are rarely explored and discussed. For the purpose of studying software behaviours and structural analysis by software execution sequence, it requires the establishment of appropriate software sequential patterns. At present, models based on software execution sequence are mostly Shooman model and component-based reliability estimation model [13], but there exists the risk of re-evaluating software reliability. Considering this, Mao and Deng [14] proposed the generic model based on the reliability of software components (shown in component transition probability graph), which traces the reliability in dynamic development process based on function abstraction. Yacoub et al. further studied the attributes of interactive components and established a scenario model of sequence to evaluate software reliability, which not only reduces the occurrence of false paths, but also has greater coverage and stronger compatibility [15]. Although models based on transition probability graph can help analyse the reliabilities of various components in various contexts, their calculations on transition probabilities and on the occurrences of paths are in fact far from accurate, so they cannot effectively reflect the actual behaviours of software. Liu et al. [16] put forward the classification of the structures of programme execution paths by software behaviour diagram, which describes the structures of function calls frequently got from the dynamic execution process of programmes. In the diagram, each node refers to a function name. Eichinger et al. [17] improved function dependence diagram and defined the boundary values between two nodes as the number of calls between two functions. Moreover, Chang et al. [18] brought up the method to test software by locating ignored conditions in software by the dependence diagram of functional relations. These models first extract functional relations in software execution process, simplify them into function dependence diagrams and transform them into software execution sequences. The recursive features of functions increase the difficulties of function dependence analysis, and some sequence information will be lost in transforming function dependence diagram into sequences, which causes high error rates of sequence mining results. Studies on the direct analysis of the mining of software execution-sequence (ES) centre on the analysis of vulnerable software sequences such as vulnerability detection and location [19], and also on the classification of software behaviours. In the literature [20], it proposed the sequential patterns of mining process and designed the iterative pattern mining algorithm extracted from the frequent representative patterns in execution traces. It also took into consideration the sequencing of functions in execution traces, and the RPs occurring in one or between more sequences due to loops. Cheng et al. [21] put forward a classification framework based on frequent patterns, and extracted typical patterns to classify software behaviours by analysing the relationships between pattern frequencies and differentiations. Lo et al. [19] suggested to apply iterative pattern mining algorithm to the classification of software behaviours. Li et al. [22] conducted software error detection by further using the distribution of locations of different patterns, which is also an important index of the classification of software behaviours. The prerequisite of mining IN from software ES data is to mine sequential patterns from complex sequence data which contain redundant information. Huang et al. [23] proposed the eliminating repetitive patterns (ERP) algorithm which aims to eliminate repetitive parts in software execution sequences, and accordingly put forward the software sequential pattern mining (SSPM) algorithm of sequential pattern mining. It not only effectively divided the dynamic execution sequences of software into different patterns, but also measured the similarities of different pattern sets by similarity indexes. On the basis of this, the work presented in this paper is conducted. 3 Modelling for software execution sequence Let I be a set of distinct events, and these events in software can be functions, method, class, interface etc. We denote I as , where each is an event in software program traces. Each trace or execution path can be considered as a sequence. Let an execution path or a sequence S be an ordered list of events and S is denoted as . is a node in S represented by a triple(Flg, Addr, FName) where Flg is the entrance–exit flag of event: namely, a caller or callee in software. As it is a caller, Flg can be denoted as E; otherwise, be X. Addr is the hexadecimal address in memory for the event and FName is its function name from set I. That is to say, each element in a sequence S is composed of three parts: Flg, Addr and FName. The specific process is shown in Fig. 1. Fig. 1Open in figure viewerPowerPoint Sequence data modelling process Definition 1.ES pattern (ESP): ESP for software is defined as where and. . are the elements derived from sequence S, satisfies: , , , . Obviously, the length of P is even.In fact, the relationship between different P is rather complex. Usually they can be described as the following three: Containment relationship. A pattern is considered as a subsequence of another pattern if there exist integers where . Then, the containment relationship of them is denoted as . Here, is the contain-pattern. Coordinative relationship. A pattern is considered parallel with another pattern if there do not exist integers where . Then, the coordinative relationship between them is denoted as . Here, and are reciprocal parallel-pattern. Repetitive relationship: A pattern is considered repetitive if another pattern and is next pattern of . Then, the repetitive relationship of them is denoted as . Here, and are reciprocal RPs. Definition 2.subtraction operator: Given an ESP and a sequence , the subtraction of S to P, denoted by , is defined as a new sequence formed from S where all events occurring in P are removed from S. Formally, is defined as such that (i) and (ii) there exists a set of integers with and and , . Definition 3.length of ESP: Length of ESP is the number of nodes in an ESP such as pattern , its length denoted by is m. If P is contain-pattern or repetitive pattern, its length is represented as follows: (1)where is sub-pattern of P, K is the number of nodes of . Otherwise (2)Owing to loops, an execution path S can have repeated occurrences of interesting patterns. A pattern can appear a repeated number of times within a sequence. Therefore, the sequence S can be abstracted as a sequence composed of many discrete events and patterns or RPs. S is denoted by the sequential pattern in Fig. 2. Fig. 2Open in figure viewerPowerPoint Sequential pattern within patterns In are patterns. There are contain-patterns , . As parallel-pattern, if , while and and are neighbours, repetitive pattern exists. In this paper, we mainly study the repetitive pattern in , pattern or may contain repetitive pattern as well. If and are RPs, nodes or patterns and orders in them are the same. 4 Pattern generation method 4.1 Sequential pattern generating The initial data sequence in the process of software running is software execution sequence (seseq). It only contains entrance–exit flag (flg) and function address (Addr) in memory which are neither complete nor intuitive. For this reason, the process of software PSG is set into two phases: sequence preprocessing and sequence patternising. In the first phase, we mine function call in software program through seseq, and in the meantime we add function name got by function address in seseq to a triple (Flg, Addr, Fname) and put them in a sequence denoted by S. In the second phase, we patternise S to be iteratively until becomes stable. Algorithm 1 (see Fig. 3) shows the specific procedures as follows. Fig. 3Open in figure viewerPowerPoint Algorithm 1: PSG Algorithm 1 (Fig. 3) has two procedures, in which lines 1–7 reveal how the sequence S is generated, while sequential pattern is formed iteratively from lines 8 to 19. Obviously, the length of pattern in are different, and these patterns maybe contain-pattern or parallel-pattern. 4.2 RPs removing Owing to loops, a trace can have repeated occurrences of interesting patterns. Looping 10 times and looping 100 times for a pattern are not of much difference in a software sequential pattern. We only know there is a loop here when we analyse the structure of whole sequence. However, frequent RPs generally appear in the software sequential pattern. So, if we remove RPs, the experimental time can be reduced, the efficiency of the experiment can be improved greatly and structure of the software sequential pattern can be much clear. Algorithm 2 (see Fig. 4) presents the pseudocode for eliminating RPs in a software sequential pattern. It searches the RPs software sequential pattern and deletes them, but retain the first pattern. Its exact procedure is described as follows. Fig. 4Open in figure viewerPowerPoint Algorithm 2: RPs removing Lines 1–2 denote the scanning of SP and the searching of RPs. While there exist m consecutive patterns such as p, these m patterns are removed in line 3. Each type of RPs is retained only once. In line 4, we store the pattern p in simplified pattern sequence (SPS) for the preparation of mining IN based on SPS in Section 4.2. After removing the repetitive parts in software execution sequence, we got the single software sequence SPS. On the basis of different testing cases, we got numerous SPS by SP generating algorithm and repetitive-pattern eliminating algorithm and got sets of these SPS sequences. 5 IN mining approach based on sequential patterns 5.1 Node ranking (NR) rules Having obtained the SPS, we could find out the constraint relationships among different function calls. We defined the outermost function as first rank and its function call as second rank, and so on, then the innermost function is rank, ; on the basis of the ranks, we also counted the occurrence of functions in pattern sequences. Where the same function occurs times in ranks, , so the of is defined as follows: (3)In formula (3), is the sum of the occurrences of , is the total of the ranks of and their quotient the determines the node influence in software system. The biggest comprehensive rank R of is then defined as: (4)In formula (4), the smaller R is, the higher rank is; or vice versa. Definition 4.Node repetition rate: Let the number of elements in collection be all N, the node repetition rate is defined as (5) Definition 5.Node coverage rate: Let the number of elements in collection all equate N, the node coverage rate is defined as (6) 5.2 IN mining As mentioned above, we got SPS in software execution process, and ranked functions and output the results by setting up NR index and by considering the SP features by Algorithm 3 (see Fig. 5). The algorithm is operated in the following three major steps: first, output the ranks of functions in SPS; second, count the occurrence of the functions in different ranks; finally, by combining function ranks and occurrences, calculate their RANK values, rank those values from the big to the small and output the results, which are detailed in Fig. 5. Fig. 5Open in figure viewerPowerPoint Algorithm 3: IN mining Lines 1–7 output the ranks of functions in SPS, and line 8 calculates the biggest rank in hierarchical sequence (HS). Line 21 shows the biggest rank of the functions, and lines 12–20 calculates the RANK value by combining the comprehensive ranks of functions and the total of their occurrences. Line 22 shows that RANK values are put into NR list (NRL), during which the values rank themselves from the big to the small. 5.3 Instance analysis In this section, some events of simplified software sequential pattern are shown in Fig. 6a. By analysing the ranks of different segments of patterns and by counting the occurrences of various functions, HS with function times are shown in Fig. 6b. Accordingly, NRL is calculated by defining RANK(, which is shown as follows: RANK(parse_opt). RANK(name_add_name). RANK(check_name_alloc). Fig. 6Open in figure viewerPowerPoint IN mining process (a) Part of SPs, (b) HS with function times, (c) NR list The final ranking results are shown in Fig. 6c. Fig. 6 shows that though the function has a higher comprehensive rank (R), few times it appears or lower rank () it owns in the SP would lead to a small RANK value for the function. For instance, the function check_name_alloc appears three times in Fig. 6a, which makes its RANK increase; however, the function parse_opt has a lower rank (), though its comprehensive rank(R = 1) is high, the RANK value becomes lower. Therefore, the functions’ influence in the software system is determined by not only their own characteristics, but also by their locations (such as rank). 6 Experiment analysis This experiment is conducted on 64 bit Windows 7 ultimate, Xeon central processing unit E5-2603 at 1.80 GHz, 16 G of random access memory and Ubuntu 14.04. 6.1 Experimental process Dynamic software execution process can be modelled as a sequence of components connected by certain dependence relationships. In this section, we test algorithms on three open-source software coded in C or C++ including C program analysis tool Cflow for the relationship among function calls an hypertext transfer protocol and reverse proxy server Ngnix and the ultimate music player Deadbeef (Tables 1–3). On Linux operating system, GNU compiler collection (GCC) is used to recompile them by adding debug information, to gather the information of programme traces automatically and to obtain the software execution sequence. Here, we select three different test cases to perform Cflow, Ngnix and Deadbeef. In each trial, we model software execution process as a pattern sequence and remove the RPs. The detail processes are listed as follows: On the Linux system, it planned to recompile the GNU open-source software by use of GCC and add some debugging information to the secondary development of the PvTrace tool so as to collect the execution traces of software processing and to establish the SP model by combining the complex software network model built by Graphviz. On the basis of software sequence model and the complex software network model, it implemented SINM, PageRank and Degree-Based algorithms by use of Visual Studio 2012 and Visual C++ 6.0 and saved the results of each algorithm. It collected the results of one-time or multi-time software execution and then conducted relevant statistical analysis on the MATLAB platform, from which it can be seen that there is a group of IN in software system. Table 1. Top-ten functions in Ngnix No. SINM (S) PageRank (P) Degree-Based (D) 1 ngx-palloc ngx-palloc 2 ngx-pnalloc ngx-pcalloc ngx-pcalloc 3 ngx-pcalloc 4 ngx-array-push ngx-array-push 5 ngx-conf-read-token ngx-array-push 6 ngx-array-init ngx-http-merge-locations 7 ngx-strlow ngx-http-merge-servers 8 ngx-http-merge-locations ngx-array-init ngx-palloc 9 ngx-cpystrn ngx-array-init 10 ngx-http-merge-servers ngx-pnalloc ; ; ; Table 2. Top-ten functions in Cflow No. SINM (S) PageRank (P) Degree-Based (D) 1 lookup nexttoken 2 nexttoken nexttoken 3 hash-symbol-hasher 4 hash-symbol-compare include-symbol 5 tokpush 6 get-token lookup 7 yylex 8 cleanup-stack lookup putback 9 include-symbol hash-symbol-hasher yylex 10 putback hash-symbol-compare ; ; ; Table 3. Top-ten functions in Deadbeef No. SINM (S) PageRank (P) Degree-Based (D) 1 mutex-unlock mutex-unlock 2 mutex-lock mutex-lock 3 pl-unlock pl-unlock pl-unlock 4 conf-lock pl-unlock pl-lock 5 conf-unlock conf-unlock 6 pl-lock conf-lock 7 conf-get-str-fast conf-get-int 8 conf-get-int conf-get-str-fast conf-lock 9 plt-unref 10 plt-ref ; ; ; 6.2 Experiment results With regard to the importance of mining analysis of software nodes, PageRank and Degree-Based analysis methods are frequently applied, but they are all based on the complexity of software network. Besides, the modelling of software execution process in complex network has also been studied substantially. For example, Huang et al. [25] applied the method of mapping software execution process as complex software network. Hence, this paper models the software execution process by sequence and by the complex network, and proposes the node mining method SINM, which is different from the classic PageRank and Degree-Based methods according to the mining algorithm of IN in complex network. It aims to compare and analyse the superiority of the SINM algorithm over the mining results of software nodes so as to confirm its high efficiency by referring to the advantages of the two classic algorithms. The following tables show the top-ten nodes obtained by different node mining methods in the three softwares Ngnix, cflow and deadbeef. Regardless of the other functions got from PageRank algorithm and Degree-Based algorithm, the same functions that only exist in results got from SINM algorithm are listed in tables. By comparing their node coverage, we put forward a new approach from the perspective of sequential patterns. The approach proposed in this paper not only lives up to people's expectation, but also combines the advantages of PageRank and Degree-Based methods. The experiment results show: compared with classic PageRank and Degree-Based algorithms, the node coverage of software Ngnix and software Cflow obtained by SINM method amounted to 70%. Particularly, the node coverage of software Deadbeef even accounted for 80%. Seen from these, the method proposed in this paper is of high efficiency and accuracy. Compared with functions obtained by PageRank and those from Degree-Based algorithm, functions got by SINM method contain key functions which can be got by Node-Degree algorithm (such as functions such as ngx-http-merge-locations and ngx-http-merge-servers in Ngnix, putback and yylex in Cflow and conf-get-int in Deadbeef), but cannot be got by PageRank algorithm or vice versa. Therefore, the method proposed in this paper, combining advantages of such classic methods as PageRank algorithm and Degree-Based algorithm, is of higher efficiency and accuracy as well as of more representative results. At the same time, the method adopted in this paper is to mine software nodes from the perspective of SPs, however, PageRank and Degree-Based comparison methods are based on complex networks. They have different algorithm inputs, with the former focusing on sequence data, but the latter on network data. In other words, it is impossible to compare accurately and precisely their computational complexities. 7 Discussion The results show that the method to mine IN on software execution sequences from the perspective of sequential patterns is of higher accuracy and efficiency than the single node mining method based on complex network software, because it considers the order of functions on execution path and the influence of the repeated patterns circulating one or more sequence on functions. Relative to the analysing method for influential functions in software network, the formation of software network destroyed the real operation process of the software first, as described in the literature [24]; and then determined the importance of nodes only by referring to the edge of nodes and nearby linking nodes as well as complex network properties rather than the relations of function call and called, include and included caused by recursion and circulation etc. Unlike, the model of sequence can reflect these relations much better. Most of the current network-based discovery methods for functions are based on software source codes or dynamic software execution network. Therefore, this sequence-based method adopted in this paper is closely related to them (, ∼0.8). For network-based node mining methods, they can complement and verify each other mutually, especially in providing huge support for the practise processes of software testing, software maintenance, software security and in the meanwhile unveiling the characteristics of software structure and software entities, which brings great guiding significance to the prediction of software optimising and upgrading. However, the current experiments mainly focus on the mining of IN based on C/C++ open-source software, which in future we need to further apply to software written in different languages. Moreover, the method proposed by this paper is based on classic loophole-free open-source software; therefore, when software testing or maintenance staff know IN before they examine problematic software, they can strongly monitor or protect in testing, increasing the efficiency of software testing, strengthening software reliability and reducing software costs. 8 Conclusion In this paper, software dynamic execution process is modelled as sequence. It treats software dynamic execution process as a complex sequential data and based on software sequence model RRP algorithm is put forward, which deals with the impact on software sequence length caused by loop structure, optimises the structure of sequential pattern and also reduces unnecessary consumption greatly. Then, the algorithm output ranks of different functions in SPS, counted the occurrences of functions in different ranks and calculated the RANK values by combining function ranks and occurrences and orderly output the values from the big to the small. In contrast to influential functions got by such classic algorithms as PageRank algorithm and Degree-Based algorithm in different models, the method we propose in this paper increases the node coverage to by combining the advantages of the two classic methods. It is very contributive to provide a theoretical and practical support to the analysis of the structural characteristics and software security. 9 Acknowledgments This work was supported by the National Natural Science Foundation of China under grant nos. 61472341 and 61572420, the Natural Science Foundation of Hebei Province P.R. China under grant nos. F2013203324, F2014203152 and F2016203330 and the Graduate innovative funding project of Hebei Province under grant no. 2016SJBS010. The authors are grateful to the valuable comments and suggestions of the reviewers. 10 References 1Albert, R., Jeong, H., Barabśi, A.: ‘Error and attack tolerance of complex networks’, Nature, 2000, 406, pp. 378– 382 2Callaway, D., Newman, M., Strogatz, S., et al.: ‘Network robustness and fragility: percolation on random graphs’, Phys. Rev. Lett., 2000, 85, (25), pp. 5468– 5468 3Cohen, R., Erez, K., Ben-Avraham, D., et al.: ‘Breakdown of the Internet under intentional attack’, Phys. Rev. Lett., 2001, 86, (16), pp. 3682– 3685 4Weng, J., Lim, E., Jiang, J., et al.: ‘Twitterrank: finding topic-sensitive influential twitterers’. Proc. of the Third ACM Int. Conf. on Web Search and Data Mining, New York, 2010, pp. 261– 270 5Vitali, S., Glattfelder, J., Battiston, S.: ‘The network of global corporate control’, Bus. Politics, 2016, 6, (3), p. e25995 6Bhattacharya, P., Iliofotou, M., Neamtiu, I., et al.: ‘Graph-based analysis and prediction for software evolution’. Int. Conf. on Software Engineering, 2012, pp. 419– 429 7Wei, D., Li, Y., et al.: ‘Degree centrality based on the weighted network’. 2012 24th Chinese Control and Decision Conf. (CCDC), 2012, pp. 3976– 3979 8Kitsak, M., Havlin, S., Paul, G., et al.: ‘Betweenness centrality of fractal and nonfractal scale-free model networks and tests on real networks’, Phys. Rev. E, 2007, 75, (5), p. 056115 9Wu, X., Zhang, M., Han, Y.: ‘Research on centrality of node importance in scale-free complex networks’. 2012 31st Control Conf. (CCC), 2012, pp. 1073– 1077 10Brin, S., Page, L.: ‘Reprint of: the anatomy of a large-scale hypertextual web search engine’, Comput. Netw., 2012, 56, (18), pp. 3825– 3833 11Lü, L., Zhang, Y., et al.: ‘Leaders in social networks, the delicious case’, PloS One, 2011, 6, (6), p. e21202 12Kleinberg, J.M.: ‘Authoritative sources in a hyperlinked environment’, J, ACM (JACM), 1999, 46, (5), pp. 604– 632 13Krishnamurthy, S., Mathur, A.P.: ‘On the estimation of reliability of a software system using reliabilities of its components’. Software Reliability Engineering, 1997, pp. 146– 155 14Mao, X., Deng, Y.: ‘A general model for component-based software reliability’, J. Softw., 2004, 15, (1), pp. 27– 32 15Huo, C., Chen, C., et al.: ‘A scenario-based reliability analysis approach for component-based software’, IEICE Trans. Inf. Syst., 2015, 98, (3), pp. 617– 626 16Liu, C., Yan, X., Yu, H., et al.: ‘Mining behavior graphs for backtrace of noncrashing bugs’. Proc. of the Fifth Siam Int. Conf. on Data Mining, 2005, pp. 286– 297 17Eichinger, F., Böhm, K., Huber, M.: ‘Mining edge-weighted call graphs to localise software bugs’, Physica, 2008, 5211, pp. 333– 348 18Chang, R.Y., Podgurski, A., Yang, J.: ‘Discovering neglected conditions in software by mining dependence graphs’, IEEE Trans. Softw. Eng., 2008, 34, (5), pp. 579– 596 19Lo, D., Chen, H., Han, J., et al.: ‘Classification of software behaviors for failure detection: a discriminative pattern mining approach’. Proc. of the 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2009, pp. 557– 566 20Lo, D., Khoo, S.C., Liu, C.: ‘Efficient mining of iterative patterns for software specification discovery’. Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2007, pp. 460– 469 21Cheng, H., Yan, X., Han, J., et al.: ‘Discriminative frequent pattern analysis for effective classification’. IEEE Int. Conf. on Data Engineering, 2007, pp. 716– 725 22Li, C., Chen, Z., Du, H., et al.: ‘Using pattern position distribution for software failure detection’, Int. J. Comput. Intell. Syst., 2013, 6, (2), pp. 234– 243 23Huang, G., Zhang, B., Ren, R., et al.: ‘A novel approach to efficiently mine structural patterns from software execution sequence’, J. Comput. Inf. Syst., 2015, 11, (3), pp. 1109– 1119 24Cai, K., Yin, B.: ‘Software execution processes as an evolving complex network’, Inf. Sci., 2009, 179, (12), pp. 1903– 1928 25Huang, G., Zhang, B., Ren, R., et al.: ‘An algorithm to find critical execution paths of software based on complex network’, Int. J. Mod. Phys. C, 2015, 26, (9), p. 1550101 Citing Literature Volume11, Issue2April 2017Pages 48-54 FiguresReferencesRelatedInformation

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Approach to mine influential functions based on software execution sequence