Artigo Acesso aberto Revisado por pares

RuleCOSI+: Rule extraction for interpreting classification tree ensembles

2022; Elsevier BV; Volume: 89; Linguagem: Inglês

10.1016/j.inffus.2022.08.021

ISSN

1872-6305

Autores

Josue Obregon, Jae‐Yoon Jung,

Tópico(s)

Explainable Artificial Intelligence (XAI)

Resumo

Despite the advent of novel neural network architectures, tree-based ensemble algorithms such as random forests and gradient boosting machines still prevail in many practical machine learning problems in manufacturing, financial, and medical domains. However, tree ensembles have the limitation that the internal decision mechanisms of complex models are difficult to understand. Therefore, we present a post-hoc interpretation approach for classification tree ensembles. The proposed method, RuleCOSI+, extracts simple rules from tree ensembles by greedily combining and simplifying their base trees. Compared with its previous version, RuleCOSI, this new version can be applied to both bagging (e.g., random forest, RF) and boosting ensembles (e.g., gradient boosting machines, GBM) and run much faster for ensembles with hundreds of trees. To assess the performance and applicability of the method, empirical experiments were conducted using two bagging algorithms and four gradient boosting algorithms over 33 datasets. RuleCOSI+ could generate the best classification rulesets in terms of F-measure together with RuleFit for RF and GBM models of the datasets among five ensemble simplification algorithms, but the rulesets of RuleCOSI+ had, on average, less than half the size of those of RuleFit. Moreover, RuleCOSI+ had the best antecedent uniqueness rate ("uniq") among the five algorithms, and had also ranked high in the number of rules ("Nrules") and the rule reduction rate ("redu"). In addition, the proposed method could reduce generalization errors in the simplified rulesets to obtain, on average, slightly better classification errors than original models of two bagging and three gradient boosting algorithms except CATBoost.

Referência(s)
Altmetric
PlumX