A critical review of rule surprisingness measures

2003; WIT Press; Volume: 29; Linguagem: Inglês

10.2495/data030531

ISSN

1746-4463

Autores

Déborah Ribeiro Carvalho, Alex A. Freitas, Nelson F. F. Ebecken,

Tópico(s)

Fuzzy Logic and Control Systems

Resumo

In data mining it is usually desirable that discovered knowledge have some characteristics such as being as accurate as possible, comprehensible and surprising to the user. The vast majority of data mining algorithms produce, as part of their results, information of a statistical nature that allows the user to assess how accurate and reliable the discovered knowledge is. However, in many cases this is not enough for the user. Even if discovered knowledge is highly accurate from a statistical point of view, it might not be interesting for the user. Few data mining algorithms produce, as part of their results, a measure of the degree of surprisingness of discovered knowledge. However, these measures can be computed in a post-processing phase, as a form of additional evaluation of the quality of discovered knowledge, complementing (rather than replacing) statistical measures of discovered knowledge accuracy. This papers presents a review of four measures of classification-rule surprisingness, discussing their main characteristics, advantages and disadvantages. Hence, the main contribution of this paper is to improve our understanding of these rule surprisingness measures, which is a step towards solving the very difficult problem of selecting the best rule surprisingness measure for a given application domain.

Referência(s)