Artigo Acesso aberto Revisado por pares

EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data

2022; Multidisciplinary Digital Publishing Institute; Volume: 11; Issue: 9 Linguagem: Inglês

10.3390/electronics11091346

ISSN

2079-9292

Autores

Ilok Jung, Jaewon Ji, Changseob Cho,

Tópico(s)

Imbalanced Data Classification Techniques

Resumo

Research on the application of machine learning to the field of intrusion detection is attracting great interest. However, depending on the application, it is difficult to collect the data needed for training and testing, as the least frequent data type reflects the most serious threats, resulting in imbalanced data, which leads to overfitting and hinders precise classification. To solve this problem, in this study, we propose a mixed resampling method using a hybrid synthetic minority oversampling technique with an edited neural network that increases the minority class and removes noisy data to generate a balanced dataset. A bagging ensemble algorithm is then used to optimize the model with the new data. We performed verification using two public intrusion detection datasets: PKDD2007 (balanced) and CSIC2012 (imbalanced). The proposed technique yields improved performance over state-of-the-art techniques. Furthermore, the proposed technique enables improved true positive identification and classification of serious threats that rarely occur, representing a major functional innovation.

Referência(s)