Rough set‐based rule generation and Apriori‐based rule generation from table data sets: a survey and a combination
2019; Institution of Engineering and Technology; Volume: 4; Issue: 4 Linguagem: Inglês
10.1049/trit.2019.0001
ISSN2468-6557
AutoresHiroshi Sakai, Michinori Nakata,
Tópico(s)Data Management and Algorithms
ResumoCAAI Transactions on Intelligence TechnologyVolume 4, Issue 4 p. 203-213 ArticleOpen Access Rough set-based rule generation and Apriori-based rule generation from table data sets: a survey and a combination Hiroshi Sakai, Corresponding Author Hiroshi Sakai sakai@mns.kyutech.ac.jp Graduate School of Engineering, Kyushu Institute of Technology, Sensui 1-1, Tobata, Kitakyushu, 804-8550 JapanSearch for more papers by this authorMichinori Nakata, Michinori Nakata Faculty of Management and Information Science, Josai International University, Gumyo, Togane, Chiba, 283-0002 JapanSearch for more papers by this author Hiroshi Sakai, Corresponding Author Hiroshi Sakai sakai@mns.kyutech.ac.jp Graduate School of Engineering, Kyushu Institute of Technology, Sensui 1-1, Tobata, Kitakyushu, 804-8550 JapanSearch for more papers by this authorMichinori Nakata, Michinori Nakata Faculty of Management and Information Science, Josai International University, Gumyo, Togane, Chiba, 283-0002 JapanSearch for more papers by this author First published: 10 July 2019 https://doi.org/10.1049/trit.2019.0001Citations: 5AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The authors have been coping with new computational methodologies such as rough sets, information incompleteness, data mining, granular computing, etc., and developed some software tools on association rules as well as new mathematical frameworks. They simply term this research Rough sets Non-deterministic Information Analysis (RNIA). They followed several novel types of research, especially Pawlak's rough sets, Lipski's incomplete information databases, Orłowska's non-deterministic information systems, Agrawal's Apriori algorithm. These are outstanding researches related to information incompleteness, data mining, and rule generation. They have been trying to combine such novel researches, and they have been trying to realise more intelligent rule generator handling data sets with information incompleteness. This study surveys the authors' research highlights on rule generators, and considers a combination of them. 1 Introduction This paper focuses on some algorithms for rule generation from table data sets with uncertainty, and surveys our research highlights on rule generators. We also consider the perspective toward a more intelligent rule generation. The key concepts in this paper are rough sets [[1]–[3]], incomplete information databases [[4], [5]], non-deterministic information systems (NISs) [[2], [6]], and the Apriori algorithm [[7]–[9]]. Each research is outstanding, and we merge them for handling more intelligent rule generator. The concept of rough sets proposed by Pawlak offers us a mathematical framework for analysing table data sets. A pair of an attribute A and its attribute value v has termed a descriptor. Each descriptor defines an equivalence class over the total set of objects (or instances). We can make use of equivalence classes for knowing the consistency and data dependency [[3], [10], [11]]. Incomplete information databases [[4], [5]] and NISs [[2], [6]] are similar frameworks for handling information incompleteness in table data sets. They employ a set P of possible values for dealing with information incompleteness, and interpret this set as that 'a value in P is the actual value, but we do not decide it' [[2], [6]]. For example, let us suppose Jack is an undergraduate student in Japan. In this case, even if we do not know Jack's precise age, we will estimate that his age is between 18 years old and maximally 25 years old, namely non-deterministic information . Lipski coped with such non-deterministic information, and developed modal question-answering based on possible world semantics [[4], [5], [12]]. He clarified the sound and complete query translation axioms, which is known as S4, for query processing. Both rough sets and incomplete information databases are related to table data sets, and it seems natural to consider rough sets for tables with information incompleteness. There seem several types of research coping with this issue, however, most of them seem to investigate the property of rough sets and not connected with the implementation of the actual system directly. There seems a gap between the model of rough set-based rule generation and the implementation of the rule generator. Furthermore, even in a standard table, the problem to obtain all minimal reducts is proved to be NP-hard [[11]]. Therefore, the logical property of completeness may often be ignored (this means there may be another rule except for the obtained rules). If we handle NISs according to the way of possible world semantics, generally the amount of the possible worlds increases exponentially. In the Congressional Voting data sets [[13]], the amount of possible worlds exceeds more than . It seems hard to examine all possible worlds sequentially. (Actually, in the subsequent sections we gave a solution to this problem, and propose rule generation according to the way of possible world semantics.) We have to pay attention to the computational complexity on rule generation as well as rough set-based rule generation model. The Apriori algorithm proposed by Agrawal et al. [[7]–[9]] is known as a novel approach to data mining and knowledge discovery [[7], [8]], which has led various data mining studies. This algorithm originally handled data in transaction form, but it can consider the Apriori algorithm for table data sets by looking at a descriptor as one item [[14]]. We term this algorithm the Deterministic Information System (DIS)-Apriori algorithm (Apriori algorithm for DISs), and extend it to the NIS-Apriori algorithm (Apriori algorithm for NISs). These algorithms preserve logical properties soundness and completeness for a set of rules [[15]]. To examine the validity of the proposed algorithms, cross-validation evaluation is often employed [[16]]. However, we do not employ cross-validation evaluation, because to prove the logical properties soundness and completeness is another way to validate the proposed algorithms. The proposed algorithm detects each implication if and only if is satisfying the condition of a rule. These properties theoretically assure the validity of algorithms. This is an advantage of the logic-based framework. In this paper, we clarify the above researches by using an example data set, and investigate the role and the property of them. More concretely, we sequentially focus on the following: (1) Rough set-based consistent rule generation (1A) Rule generation by the definability of a set in DISs, (1B) Rule generation by the definability of a set in NISs, (1C) Rule generation by the discernibility function in NISs. (2) Apriori-based rule generation, (2A) DIS-Apriori-based rule generation from DISs, (2B) NIS-Apriori-based rule generation from NISs, (2C) Rule generation by the NIS-Apriori with a target descriptor. This paper is organised as follows. Section 2 considers (1A), and Section 3 copes with (1B). Section 4 reviews the discernibility function and considers (1C). Section 5 copes with (2A), and Section 6 does (2B). In each section, we present the actual execution logs. Section 7 compares two types of rule generation, and clarifies the advantage and disadvantage of each rule generation. Then, we combine two types of rule generation, and propose (2C). Section 8 concludes this paper. 2 Rough set-based consistent rule generation in DISs A DIS is a quadruplet where OB is a finite set whose elements are called objects or instances, AT is a finite set whose elements are called attributes, is a finite set whose elements are called attribute values of , and f is such a mapping [[1], [3], [11]] We say objects x and y are indiscernible for ATR, if for every . In rough sets, the following indiscernibility relation for ATR is employed. This relation is also an equivalence relation over OB. where the term and an equivalence class with an object x and a set of all equivalence classes for ATR, respectively. If a set satisfies the equation , we say X is definable in DIS. (Actually, there is a subset and it is enough to consider .) If X is definable in DIS, it can characterise the set X by using the condition (). If X is not definable, X is approximated by the next two sets Generally, a pair ( , ) is termed rough sets of X. We approximate the set X by and which are both definable. The details are in [[1], [3], [11]]. Let and denote the condition attributes and a decision attribute, respectively. An object is consistent with other for CON and Dec, if for every implies . If is consistent, we say is a consistent implication. Here, we focus on an important proposition, which clarifies the concept of consistency and equivalence classes. Proposition 1.In DISs, the following two conditions are equivalent [[3]]: (i) An object is consistent with other for CON and Dec. (ii) . Proposition 1 connects the concept of consistent implications with the inclusion relation between equivalence classes. At the beginning of Pawlak's work, a rule was dealt with as a consistent implication, and the inclusion relation was examined for consistent rule generation. Let us show each concept by using an example. Example 1.We employ DIS in Table 1, and clarify each definition. (i) In , holds. A set is definable, because for an attribute set , but a set is not definable. In this case, rough sets ({1, 3}, {1, 2, 3, 4}) approximate the set X. For an attribute set , holds. In this case, X is definable, and rough sets of X is ({1, 2, 3}, {1, 2, 3}). (ii) To specify conditions in tables, we employ a descriptor like (it means ). The descriptor specifies a set , and we consider the definability of . In , is not definable, and rough sets are . In , is not definable, and rough sets are ({2}, {1, 2, 3}). (iii) In , equivalence classes {1, 3} and {2, 4} are specified by and , respectively. Therefore, the set OB is characterised by the condition , and we have an implication . In , holds. This means {2} is a lower approximation of . So, a consistent implication is obtained from object 2 by using Proposition 1, and we have . Table 1. Exemplary DIS OB A B Dec 1 2 2 1 2 1 2 2 3 2 2 2 4 1 3 3 In Example 1, a property among rules, definability of a set, and lower approximation are shown. Because the conjunction of descriptors defines an equivalence class , we have a consistent implication , if we find an inclusion relation (X is specified by ). Pawlak's framework is extended to variable precision rough sets (VPRSs) [[17]] and other several research areas [[10], [11], [16], [18]–[30]]. 3 Consistent rule generation in NISs 3.1 Rules in NISs and possible equivalence relations A NIS is also a quadruplet [[2], [6]] where g is a mapping from to a power set of . The interpretation of each set characterises the meaning of NIS. We interpret it as that an actual value is in , but we cannot specify the actual value without additional information. The set is equal to , if we do not have any information of x. NIS was investigated by Orłowska and Pawlak, and information incompleteness in table data sets was discussed. The framework of incomplete information databases [[4], [5]] was also proposed by Lipski for handling question-answering under uncertain information. Relational algebra and null values by Codd [[31]] were followed by these researches. In about 1985, new research areas like data mining and knowledge discovery emerged in addition to IR and question-answering in databases, and the problem on NISs also seemed to be transferred to the problem on missing values. The framework of learning from examples with unknown attribute values [[32]] was proposed. Rules from tables with missing values were investigated [[21]]. However, some assumptions are required in such research on missing values, and the interpretation of such assumptions seems complicated and vague. For this reason, we follow the original definition of NISs instead of considering missing values with some assumptions. Our framework and the framework handling missing values are different, and the obtainable rules in NISs are slightly different from the rules with an assumption of missing values [[14]]. Definition 1.For NIS , we fix a set and such a mapping that . We term DIS a derived DIS from NIS [[33]]. Definition 2.Let denote a set is a derived DIS from . We also term equivalence classes in a derived DIS a possible equivalence class (pe-class) in . Let be a set of all pe-classes in [[33]]. Example 2.A table in Table 2 is an exemplary NIS. In , there are 48 () derived DISs for . Table 3 shows a derived DIS from . In this derived DIS , , and this is . Each element in is a pe -class in . There are 48 derived DISs for , so there may exist 48 different . Table 2. Exemplary NIS OB A B Dec 1 3 1 1 2 {1, 2} 2 2 3 1 3 3 4 1 {4, 5} 2 5 3 3 2 6 2 3 {1, 2} 7 2 3 3 8 1 {1, 2, 3} {4, 5} Table 3. Derived DIS from OB A B Dec 1 3 1 1 2 1 2 2 3 1 3 3 4 1 4 2 5 3 3 2 6 2 3 1 7 2 3 3 8 1 1 4 Remark 1. (i) In Section 2, we examined the relation between rules in DISs and equivalence relations. Thus in NISs, it is necessary to consider the relation between rules in NISs and pe -classes. (ii) In NIS , we see that there is an actual derived DIS based on the definition of non-deterministic information. In NIS , we characterise rules which certainly hold in and rules which possibly hold in . Definitions 1, 2, , certainty, and possibility follow the idea of Lipski's incomplete information databases [[4], [5]]. (iii) In rough sets with respect to DISs, the purpose is to characterise the set X specified by a descriptor , however, in NISs, we characterise implications based on . Both frameworks are related to rule generation, but the character of the obtained rules is different. Definition 3.(A simplified definition) [[33]]: (i) An implication is a certain and consistent rule (c-c rule), if is a consistent rule in each . (ii) An implication is a possible and consistent rule (p-c rule), if is a consistent rule in at least one . Clearly, one c -c rule becomes a p -c rule. If each in is a singleton set, we identify this with one DIS , and holds. Therefore, (1) and (2) in Definition 3 define the same implications, and Definition 3 covers the rules in not only NIS but also DIS. We see that Definition 3 is a natural extension of consistent rules in DISs. We have the following remark. Remark 2. (i) If an implication is a c -c rule, this is a consistent rule in the unknown actual . (ii) If an implication is a p -c rule, this may be a consistent rule in the unknown actual . (iii) If an implication is not a p -c rule, this is not any rule in the unknown actual . (iv) Rule generation in NISs considers the certainty and the possibility of rules for the unknown actual DIS . 3.2 Consistent rule generation by the definability of a set in NISs and manipulation of pe-classes in Prolog Let us consider the definability of a set X specified by a descriptor . In DIS , the equivalence relation is unique, but in NISs there are plural (). Algorithm 1 (see Fig. 1) employs the backtrack procedure, and examines the definability of a set X. Fig. 1Open in figure viewerPowerPoint Algorithm 1: (Definability of a set in NISs) [[33]] Let's consider Example 1, again. If we select , we have . For , Algorithm 1 (Fig. 1) fixes , picks up , and assigns . Then, are generated, and is derived. Thus, we know OB is definable. (In reality, OB has been always defined.) Algorithm 1 (Fig. 1) is related to the algorithm LEM [[32]]. The set CL is unique in every DIS, because CL is an equivalence class with an object x. However, the set CL in NIS is a pe -class, so it may not be unique. Thus, search with backtrack is necessary for finding all CL. Algorithm 1 (Fig. 1) is different from the LEM algorithm at this point. As for the definability of a set, we can define certainly definable and possibly definable like Definition 3. In this definition, a new concept of the definability of a set is investigated. By extending Algorithm 1 (Fig. 1), we implemented a software tool handling two concepts, certainly definable and possibly definable. In the following, we explain them through the actual execution on NIS . It is generally difficult to simulate the backtrack functionality by using the procedural programming language, so we employed Prolog, because Prolog is originally equipped with the backtrack functionality. Every program handling pe -classes and data dependency is implemented in Prolog on Windows PC (Core i7 CPU, 3.6 GHz). We at first prepare the following data set caai.pl in Prolog, whose contents are in Table 2. object(8,3). data(1,[3,1,1]), data(2,[[1,2],2,2]), data(3,[1,3,3]), data(4,[1,[4,5],2]), data(5,[3,3,2]), data(6,[2,3,[1,2]]), data(7,[2,3,3]). data(8,[1,[1,2,3],[4,5]]). We employ a file attrib.pl for defining the attributes, whose contents are decision([3]) and condition([1,2]). The content decision([3]) specifies the decision attribute (third attribute), i.e. Dec, and condition([1,2]) does the first (A) and the second (B) is condition attributes. Since OB is definable by any , we can obtain any set of pe -classes by examining the definability of OB. Fig. 2 shows this. Fig. 2Open in figure viewerPowerPoint Sets of all pe -classes in each attribute are displayed. For example, occurs one time on the attribute A, and it consists of three pe -classes. Similarly, occurs two times for the attribute B, and occurs two times for the attribute Dec In Fig. 3, the definability of sets {1, 4, 5} and {3, 4} is examined. As a side effect of Algorithm 1 (Fig. 1), some pe -classes are obtained. The set {1, 4, 5} is definable in each of 48 derived DISs. In the execution, Remark 2 is internally applied to the definability of a set. Fig. 3Open in figure viewerPowerPoint Definability of two sets {1, 4, 5} and {3, 4}. The set {1, 4, 5} is examined as certainly definable in . The term 'certain' over the execution time implies that the set is certainly definable. The set {3, 4} is possibly definable. This execution is based on Remark 2 In Fig. 4, information on data dependency is displayed. There are 48 derived DISs from , and there exists no consistent DIS. The maximum degree of dependency from to Dec is 0.75 in 48 derived DISs, and the minimum degree is 0.5. Thus, the degree of dependency in is between 0.5 and 0.75. Objects 1, 4, and 5 are consistent in each of 48 derived DISs. This was also shown that a set {1, 4, 5} is definable in each of 48 derived DISs in Fig. 3. Based on Figs. 3 and 4, we know that it is possible to generate c -c rules below: (1) If we employ one disjunction , it is possible to generate c -c rule from object 4. Fig. 4Open in figure viewerPowerPoint Information on data dependency from to Dec We have briefly described a set of pe -classes defined by NISs and its manipulation in Prolog powered by Algorithm 1 (Fig. 1). A more detailed explanation is in [[33]–[35]]. Actually, Algorithm 1 (Fig. 1) employs exhaustive search over pe -classes, so the complexity of calculation depends upon the amount of . Generally, increases exponentially. Even in , the amount of is 48. Thus, Algorithm 1 (Fig. 1) will not be suitable to generate rules from large size data sets. The application of the pe -manipulation system in Prolog will be restricted to small size data sets. 4 Consistent rule generation by the discernibility function method In rough sets, we can easily obtain a consistent rule by finding one lower approximation of X specified by a decision . However, a generally lower approximation may not be unique, so it is necessary to find all lower approximations to obtain all consistent implications. Skowron proposed the discernibility function, and the problem to determine all minimal reducts was translated to an SAT problem (to obtain solutions of the conjunctions of disjunction of descriptors). Thus, Skowron proved that 'to determine all minimal reducts is NP-hard' [[11]], so it is not easy to enumerate all lower approximations for large data sets. However, the discernibility function method (df -method) affords a way to enumerate each lower approximations. If each solution of the discernibility function is examined, the df -method preserves the logical property on rule generation below: Soundness) The df -method generates only consistent implications. Completeness) Any consistent implication is obtainable by the df -method. Generally, the property of soundness will be equipped with every system, and the property of completeness assures that no rule is missed in rule generation. So, completeness will be a preferred property. However it sometimes seems to be ignored in rough set-based systems due to the Skowron's result of NP-hard. Example 3.(1) Let's consider Example 1 and the decision . Since specifies , therefore, we discriminate objects 1, 2, and 3 by using A and B. As for object 1, both and discriminate object 1 from object 4. As for object 2, does object 2 from object 4, and and discriminate object 3 from object 4. Thus, we have a discernibility function . The solution to becomes the condition part of consistent rules. The minimum solution is , and we have a minimum consistent rule . We have extended the df -method in DISs to the df -method in NISs [[36]]. In the df -method in NISs, a set X of objects must be discriminated from any object in in each of derived DISs. For handling this, a new discernibility function was defined [[36]]. Fig. 5 is a log data for generating c -c rules from in Table 2. This software tool is also implemented in Prolog. By using Fig. 5, we explain the overview of the df -method in NISs sequentially. (i) At first, we define a descriptor as the decision part, and it is stored in the attrib.pl file. The data set caai.pl is translated to caai1.rs file by using the attrib.pl file. In this caai1.rs, information on pe -classes are stored. (ii) The init command examines each object whose decision part is , and inf = [2, 4, 5] is displayed. For each object 2, 4, and 5, the definability of an object is examined, and only object 5 is detected as certainly definable. Thus, c -c rules can be obtainable from object 5. (iii) Since c -c rule in Definition 3 is a consistent rule in each of derived DISs, the discernibility function in needs to discriminate objects 1, 3, 6, 7, 8 from object 5 in each of 48 derived DISs. Here, [[8], [18]] (it means ) discriminates object 1 from object 5, [[7], [18]] (it means ) does objects 3, 6, 7, and 8. In this case, the authors of [[8], [18]] can discriminate object 8 in some derived DISs but not 48 derived DISs. Thus, the authors [[8], [18]] cannot be employed for discriminating object 8. This is the extended part of the discernibility function in NISs. (iv) The discernibility function is displayed in Fig. 5. In this case, the solution of is uniquely , and we have a minimum c -c rule from object 5, which is shown in formula (1). (v) To obtain all c -c rules, we need to change the decision part to , , , , and need to repeat the same steps for each decision part. Fig. 5Open in figure viewerPowerPoint Log data for c-c rule generation from We briefly described the df -method in NISs by using an actual execution log. The next discernibility function affords an intuitive way to enumerate each lower approximation. However, for large size data sets, to have all solutions of becomes a satisfiability problem, SAT. Therefore, the df -method will not be suitable for generating rules from large size data sets. The application of the df -method in Prolog will also be restricted to small size data sets. It is necessary to employ an exhaustive search for finding all lower approximations. We have implemented some variations of a software tool to obtain lower approximations. In one of them, we can sequentially specify some descriptors for reducing the size of [[35], [36]]. 5 Apriori-based rule generation We focus on the Apriori algorithm, and extend it to the NIS-Apriori algorithm and NIS-Apriori in SQL. 5.1 Apriori algorithm and rules in DISs The Apriori algorithm was originally defined for handling the transaction format data, and the operation of item sets was investigated [[7]]. The Apriori algorithm is known as one of the famous algorithms for knowledge discovery [[9]]. Generally, the decision part is not fixed in the Apriori algorithm, and the purpose is to obtain association rules defined by each frequent item set. However, if we recognise an item with a descriptor in table data sets, it is possible to apply the Apriori algorithm to rule generation from table data sets. In DIS in Table 1, we see each tuple shows an item set, and the table is a set of item sets below: We termed the algorithm handling the above data structure a DIS-Apriori algorithm. The relation between the Apriori algorithm and the DIS-Apriori algorithm is below: (1) The amount of elements in each is the same as the number of attributes. (2) The decision attribute Dec is usually predefined, and the decision part is an element in the set . (3) Except (1) and (2), the DIS-Apriori algorithm is almost equal to the Apriori algorithm. In rough sets, rules are originally identified with a consistent implication, and we described methods to obtain consistent implications in the previous sections. However, the consistency on implications is often too strict, so the definition of rules was extended to VPRS [[17]]. Similarly, in the Apriori algorithm, the criterion values support and accuracy (or confidence) are employed. We follow the research on rules, and we consider the next definition. Definition 4.For DIS , two given threshold values , an implication satisfying (1) and (2) is called (a candidate of) a rule in [[3], [7], [11], [37]]. Here is an equivalence class defined by the formula . If , we define both and . Remark 3. (i) The consistent implications, which are dealt with as rules in the previous sections, corresponding to the case of in Definition 4. Therefore, Definition 4 extends the concept of consistent rules. (ii) Rough set-based rule generation employs Proposition 1 and selects as a top priority for rule generation. The selection of lower approximations means to generate rules satisfying . (iii) Since both the pe -classes manipulation and the df -method handle only consistent implications, we cannot apply them to generating rules satisfying . We need other methods handling rules in Definition 4. 5.2 DIS-Apriori algorithm and its simulation This subsection clarifies the properties of the criterion values and in rule generation. The next sets , , …, are also introduced. Here, the subscript i in is the amount of descriptors in the condition part, and is the amount of implications. The purpose of Apriori-based rule generation is to detect implications in Definition 4 from . Of course, we can examine each sequentially. However this trivial method will take much execution time. The Apriori algorithm employs the property on redundancy, and proposes a more effective method [[7]]. Definition 5.For an implication and any descriptor , we say is a redundant implication of [[14], [15]]. Definition 5 reduces the amount of rules, and we handle only minimal implications as rules. Since holds, if we detect an implication is a rule, we automatically see the redundant implication is also a rule. (In this case, we ignore condition, because may be less than for .) Example 4.Now, we consider Table 4. Namely, the accuracy value of the redundant implication is bigger. Then, we consider Table 5. In this case, the accuracy value of the redundant implication is lower. Thus, we have the next properties on implication and its redundant implication [[14], [15]]. Property 1: If , any redundant implication is not a rule, because clearly holds. Property 2: If and , and any redundant is a rule. (We only consider minimal implications as rules.) Property 3: If and , a redundant may be a rule. Base on three properties, we do not have to consider any redundant implication satisfying (Property 1) nor (Property 2). It is enough to consider any redundant implication satisfying (Property 3) for rule generation. Namely, we can consider the following set: instead of . Generally, the amount of is much smaller than the amount of .The DIS-Apriori algorithm in Algorithm 2 (see Fig. 6) makes use of and Properties 1–3. Since the DIS-Apriori algorithm handles by using , , …, it is sound and complete for the rules in Definition 4. So, any rule specified in Definition 4 can be obtained by Algorithm 2 (Fig. 6). Table 4. Exemplary DIS
Referência(s)