Artigo Acesso aberto Revisado por pares

EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification

2020; Elsevier BV; Volume: 101; Linguagem: Inglês

10.1016/j.asoc.2020.107033

ISSN

1872-9681

Autores

Hoang Lam Le, Dario Landa-Silva, Mikel Galar, Salvador García, Isaac Triguero,

Tópico(s)

Evolutionary Algorithms and Applications

Resumo

Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with the majority of the examples. Undersampling approaches reduce the size of the majority class to balance the class distributions. Evolutionary-based approaches are prominent, treating undersampling as a binary optimisation problem that determines which examples are removed. However, their utilisation is limited to small datasets due to fitness evaluation costs. This work proposes a two-stage clustering-based surrogate model that enables evolutionary undersampling to compute fitness values faster. The main novelty lies in the development of a surrogate model for binary optimisation which is based on the meaning (phenotype) rather than their binary representation (genotype). We conduct an evaluation on 44 imbalanced datasets, showing that in comparison with the original evolutionary undersampling, we can save up to 83% of the runtime without significantly deteriorating the classification performance.

Referência(s)