Database Issues in Knowledge Discovery and Data Mining
1999; Australasian Association for Information Systems; Volume: 6; Issue: 2 Linguagem: Inglês
10.3127/ajis.v6i2.310
ISSN1449-8618
AutoresChris P. Rainsford, John F. Roddick,
Tópico(s)Rough Sets and Fuzzy Logic
ResumoIn recent years both the number and the size of organisational databases have increased rapidly.However, although available processing power has also grown, the increase in stored data has not necessarily led to a corresponding increase in useful information and knowledge.This has led to a growing interest in the development of tools capable of harnessing the increased processing power available to better utilise the potential of stored data.The terms "Knowledge Discovery in Databases" and "Data Mining" have been adopted for a field of research dealing with the automatic discovery of knowledge impb'cit within databases.Data mining is useful in situations where the volume of data is either too large or too complicated for manual processing or, to a lesser extent, where human experts are unavailable to provide knowledge.The success already attained by a wide range of data mining applications has continued to prompt further investigation into alternative data mining techniques and the extension of data mining to new domains.This paper surveys, from the standpoint of the database systems community, current issues in data mining research by examining the architectural and process models adopted by knowledge discovery systems, the different types of discovered knowledge, the way knowledge discovery systems operate on different data types, various techniques for knowledge discovery and the ways in which discovered knowledge is used. Vol. 6 No. 2May 1999describing intereslingness is provided by Asa and Mangano: Performance, Simplicity, Novelty and Significance (Asa and Mangano 1995). TYPES OF DISCOVERED KNOWLEDGEThe type of knowledge that is discovered from databases and its corresponding representational form varies widely depending on both the application area and database type.The specification of the type of knowledge to be discovered directs the pattern filtering process.Knowledge learned from large sets of data can take many forms including classification knowledge, characteristic rules, association rules, functional relationships, functional dependencies and causal rules.This section will describe each of these categories of knowledge and discuss example systems that learn each type.In Table 1 the types of knowledge which are explicitly supported by a selection of current data mining tools are indicated.Many of these tools are subject to ongoing development and therefore this represents a summary at the present time.Moreover, the purpose of this survey is to demonstrate the broad diversity of a cross section of data mining tools and not to form the basis of any tool comparison or evaluation. Classification KnowledgeClassification knowledge can be used to categorise new examples into classes on the basis of known properties.Such information can, for example, be used by lending institutions to classify the credit risk of prospective borrowers and could be constructed from records of past loans.Following the formalism of Agrawal el al.(1992) inferring classification functions from examples can be described as follows: Let G be a set of m group labels {Gi, G 2 ,..., G m }.Let A be a set of n attributes (features) {A], A 2 ,..., A n }.Let dom(Aj) refer to the set of possible values for attribute A,.We are given a large database of objects D in which each object is an n-tuple of the form < vj, v 2 ,..., v n > where v, € dom(A,) and G is not one of A,.In other words, the group labels of objects in D are not known.We are also given a set of example objects E in which each object is a (n+l)-tuple of the form < YI, v 2 ,..., v n , gp> where v, e dom(A,) and g* e G.In other words, the objects in E have the same attributes as the objects in D, and additionally have group labels associated with them.The problem is to obtain m classification functions, one for each group G,, using the information in E, with the classification function f, for group G; being f): AiXA 2 x...A n -G; fory = I,..., m.We also refer to the examples set E as the training set and the database D as the test data set.
Referência(s)