Artigo Produção Nacional Revisado por pares

Multi-objective genetic algorithm for missing data imputation

2015; Elsevier BV; Volume: 68; Linguagem: Inglês

10.1016/j.patrec.2015.08.023

ISSN

1872-7344

Autores

Fábio M. F. Lobato, Claudomiro Sales, Igor Araújo, Vincent Tadaiesky, L. F. S. Dias, Leonardo Ramos, Ádamo Lima de Santana,

Tópico(s)

Optimal Experimental Design Methods

Resumo

A large number of techniques for data analyses have been developed in recent years, however most of them do not deal satisfactorily with a ubiquitous problem in the area: the missing data. In order to mitigate the bias imposed by this problem, several treatment methods have been proposed, highlighting the data imputation methods, which can be viewed as an optimization problem where the goal is to reduce the bias caused by the absence of information. Although most imputation methods are restricted to one type of variable whether categorical or continuous. To fill these gaps, this paper presents the multi-objective genetic algorithm for data imputation called MOGAImp, based on the NSGA-II, which is suitable for mixed-attribute datasets and takes into account information from incomplete instances and the modeling task. A set of tests for evaluating the performance of the algorithm were applied using 30 datasets with induced missing values; five classifiers divided into three classes: rule induction learning, lazy learning and approximate models; and were compared with three techniques presented in the literature. The results obtained confirm the MOGAImp outperforms some well-established missing data treatment methods. Furthermore, the proposed method proved to be flexible since it is possible to adapt it to different application domains.

Referência(s)
Altmetric
PlumX