Multi‐objective‐based feature selection for DDoS attack detection in IoT networks
2020; Volume: 9; Issue: 3 Linguagem: Inglês
10.1049/iet-net.2018.5206
ISSN2047-4962
AutoresMonika Roopak, Gui Yun Tian, Jonathon A. Chambers,
Tópico(s)Software-Defined Networks and 5G
ResumoIET NetworksVolume 9, Issue 3 p. 120-127 Research ArticleFree Access Multi-objective-based feature selection for DDoS attack detection in IoT networks Monika Roopak, Corresponding Author Monika Roopak m.roopak2@newcastle.ac.uk School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU UKSearch for more papers by this authorGui Yun Tian, Gui Yun Tian School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU UKSearch for more papers by this authorJonathon Chambers, Jonathon Chambers School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU UKSearch for more papers by this author Monika Roopak, Corresponding Author Monika Roopak m.roopak2@newcastle.ac.uk School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU UKSearch for more papers by this authorGui Yun Tian, Gui Yun Tian School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU UKSearch for more papers by this authorJonathon Chambers, Jonathon Chambers School of Engineering, Newcastle University, Newcastle upon Tyne, NE1 7RU UKSearch for more papers by this author First published: 01 May 2020 https://doi.org/10.1049/iet-net.2018.5206Citations: 36AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract In this study, the authors propose a multi-objective optimisation-based feature selection (FS) method for the detection of distributed denial of service (DDoS) attacks in an internet of things (IoT) network. An intrusion detection system (IDS) is one approach for the detection of cyber-attacks. FS is required to reduce the dimensionality of data and improve the performance of the IDS. One of the reasons for the failure of an IDS is incorrect selection of features because most of the FS methods are based on a limited number of objectives such as accuracy or relevance of data, but these are not enough as they can be misleading for attack detection the contribution of this work is to develop appropriate FS method. They have implemented the nondominated sorting algorithm with its adapted jumping gene operator to solve the optimisation problem and exploited an extreme learning machine as the classifier for FS based on six important objectives for an IoT network. Experimental results verify that the proposed method performs well for FS and have achieved 99.9% and has reduced the total number of features by nearly 90%. The proposed method outperforms other proposed FS methods for the detection of DDoS attacks by an IDS. 1 Introduction The internet of things (IoT) has emerged as a promising technology. In the IoT, every object has a unique identity and is accessible by the network. The status and positions of the objects can be established, services and intelligence can be added to this expanded internet, thereby helping us to improve our personal and professional life and our social environment. It promises to create a world where all the objects also called smart objects, around us, are connected to the internet and communicate with each other with no human interference. This technology promises to be potentially beneficial for people with disabilities and the elderly, enabling improved levels of independence and quality of life at a reasonable cost. The IoT network system implements a proper security mechanism such as encryption, back up of data, user authentication and applications, and integrity assurance of processed and stored data in the system. In theory, an IoT system is fully secure with all the necessary security mechanisms in place; however, the situation is not as simple as that. Like any other computer network system, the IoT is susceptible to different cyber-attacks. Recent attacks on IoT networks [1] have revealed that cyber security for IoT networks is still a major issue. With the development of IoT networks, cyber-attacks against such systems have increased significantly, especially distributed denial of service (DDoS) [2] attacks, which have affected many IoT networks and resulted in devastating losses. An intrusion detection system (IDS) is one of the technologies for the detection of cyber-attacks. As real-world data are generally huge, the performance of the IDS can be affected, leading to a requirement of feature selection (FS) to reduce the dimensionality of data and improve the performance of the IDS system. The DDoS attack [3] is one such attack that has resulted in devastating losses in IoT networks. Fig. 1 presents the method of implementation of a DDoS attack; initially, the hacker selects a DDoS master (Bot), an IoT device such as a computer, laptop etc., by compromising that device by taking advantage of the vulnerability of that IoT device [4]. The attacker then uses that DDoS Bot to further compromise a number of systems (sometimes thousands) on the networks such as laptops, computers, CCTVs etc., which are known as Zombie bots. The attacker instructs these zombie bots via the DDoS master to send several flooding attacks to the target system, which results in denial of service to the legitimate users of the system. These kinds of cyber-attacks are attractive for hackers as they involve easy implementation of attacks to target large scale and popular websites to disable them. Therefore, a DDoS attack causes tremendous damage to servers and devices on the internet and creates conditions in which legitimate users of a system can no longer access resources or services. Fig. 1Open in figure viewerPowerPoint DDoS attack implementation Recently, DDoS attacks have targeted various IoT networks, e.g. on 21 October 2016 Dyn server, a company that controls much of the internet's domain name system infrastructure in America was hit by a DDoS attack by a new weapon called the Mirai botnet. Major sites affected by this attack were Amazon, Netflix, PayPal, Spotify, and Twitter in Europe and the US. Another incident of a DDoS attack on an IoT network was recorded in April 2017 where a new IoT botnet was discovered named Persirai, which shares Mirai's codebase and targeted over 1000 different models of internet protocol (IP) camera. The attack was discovered by cyber security researchers at Trend Micro and was affecting 122,069 IP cameras across the globe. The motivation for this work is that IoT systems currently have suffered devastatingly as discussed above in some of the examples of the latest DDoS attacks such as the Mirai DDoS attacks on the Dyn server. The performance and efficiency of an IDS to detect such cyber-attacks are dependent on the performance of the classifiers used to differentiate the normal data from attack data. Real-time network measurements are generally huge, which is a big challenge for the classifier to handle, therefore, appropriate features must be selected from the raw data so that the performance and complexity of the classifier for attack detection may improve. It is critical for an IDS to exploit an appropriate FS method for DDoS attacks [5]. The selection of important features that are most relevant for the detection of an attack with a learning algorithm is a key problem. There are basically three kinds of FS methods [6]: (i) filtering methods dependent on the statistical properties of the features. Features are selected based on their relevance to provide information about different classes [7, 8]. The advantage of filtering methods is that they do not demand much computation, so they are less expensive [9]. The drawback of filtering algorithms is that they are good only for independent features but for the rest, they may result in redundant features. (ii) Wrapper methods select features using the outcome of a learning algorithm. Comparatively, wrapper methods are more complex and demand more computational resources, but their performance is better than filtering methods because of more accurate results. (iii) Hybrid methods combine the advantages of filtering methods and Wrapper methods [9]. Various algorithms have been proposed for FS of various wireless networks, most of which are based on performance matrices such as accuracy, relevance, and redundancy. To solve real-time problems considering two or three objectives is not enough. Accuracy is one of the most common objectives for FS and attacks detection, but it is questionable to rely on considering accuracy as the best model. It may be the case that accuracy is as high as 99.9%, but it is possible that the precision and recall values are low, which is why the value of false positives and false negatives is high. So, concluding performance on the basis of one to three objectives could be misleading. There is, therefore, a demand to use the multi-objective optimisation-based method of FS for IoT networks. The non-dominated sorting genetic algorithm (NSGA-II) is an evolutionary algorithm for optimising two or more objective problems. NSGA-II was proposed by Deb and his colleagues in [9]. As the nature of the problem of FS for DDoS attack detection is multi-objective and combinatorial, so it demands to use an evolutionary algorithm that provides a best-optimised solution. An improved variant of the NSGA-II combined with a jumping gene named NSGA-II-JG [10] along with its different variants has been used to solve a different multi-objective problem, which resulted in better convergence with reduction of the central processing unit (CPU) time [11]. In [12, 13], it was found that the NSGA-II with its adapted jumping gene operator (NSGA-II-aJG) outperforms other variants of NSGA-II on different evaluating matrices. The main contributions of this study are, therefore, to propose A FS method based on, in particular, considering six objectives, namely maximise relevance, minimise redundancy, minimise the number of features, maximise classifier accuracy, maximise recall, and maximise precision. A method incorporating the jumping gene adapted NSGA-II based on six important objectives. Application of the extreme learning machine (ELM) classifier in the context of a multi-objective feature extraction-based IDS system for DDoS attack detection. Also, to perform an extensive evaluation on the latest CICIDS2017 dataset and performance comparison using accuracy, recall, and precision. Performance comparison of state-of-the-art methods with the proposed work. The remaining paper is organised as follows: Section 2 contains a literature review, Section 3 describes the methodology used, Section 4 defines the objective functions considered, Section 5 presents results and discussions and in Section 6 conclusions are outlined. 2 Literature review IoT networks consist of different varieties of connected devices as part of the network such as smartphones, computers, light bulbs, CCTVs etc., which may have limited resources such as storage, computations, network capacity, and make the IoT network highly susceptible to cyber-attacks. As the IoT is evolving, more and more devices are connected to this network, and therefore susceptibility to DDoS attacks has grown in recent times. In this section, we discuss some of the latest DDoS attacks that have occurred. In October 2015, attackers were able to compromise more than 900 CCTV cameras spread around the globe and used them to attack Imperva Incapsula's client (name disclosed) websites by launching a DDoS attack. The target of attacks was a rarely used asset of a large cloud service, catering to millions of users worldwide. It was found that botnet was distributed globally including Taiwan (24%), US (16%), Indonesia (9%), Mexico (8%), Malaysia (6%), Israel (5%), Italy (5%), and at other parts of the world as well. On 21 September 2016, the French hosting company named OVH became a victim of a 1.5 Tbps DDoS attack the largest DDoS attack ever recorded, which was implemented using hundreds of thousands of comprised IP cameras, routers, and DVRs. This attack was initiated by flooding the network with a massive torrent of traffic on 20 September 2016 towards OVH's website via 152,463 hacked low powered cameras and smart devices, which increased substantially in the next 48 h. In November 2016, by using a modified version of the Mirai botnet, 900,000 Deutsche Telekom Customers were knocked offline by launching a DDoS attack via infected routers, which disrupted telephony and television services and internet connections causing million pounds damage to the company. Cloud services are the backbone of the IoT system as all the data of IoT devices are collected, processed, and analysed in the cloud. Recently, DDoS attacks have been able to target cloud computing using the important set of features of service provided by the cloud such as auto-scaling, pay-as-you-go accounts, and multi-tenancy. In February 2018, attackers succeeded in attacking popular online code management cloud-based website GitHub [14]. This is the biggest DDoS attack recorded to date with incoming traffic of 1.3 Tbps. This attack was implemented by sending 126.9 million packets per second. As GitHub was using DDoS protection services, so this was detected within 10 min of the attack starting. According to the survey, cloud-based services such as Microsoft Azure is the most abused platform by hackers with 38.70% attacks originated from this, another cloud service provider Amazon Web Services has been reported to be used 32.70% times while Google is being used 10.78% for flooding the DDoS attacks [15]. One of the ways the cloud mitigates the DDoS attack is by making use of the auto-scaling cloud service. As the number of requests increases the cloud management software in the cloud deploys more resources in terms of more virtual network functions than required. This is a kind of self-coping technique against DDoS attacks, but this results in another economic denial of sustainability (EDoS) threat [16]. As a result, the victim does not suffer performance degradation from full denial of service but suffers because of the economic damage [17]. A novel approach is proposed in [18] to overcome a DDoS attack in a cloud-based system. The author has proposed AsIDPS, an auto scaling-based method, which is based on software-defined network and docker container technology. In [19], the author proposed EDoS Eye, a novel method to combat the issue of EDoS in cloud computing based on game theory. A static game scenario is implemented to model interaction between the attacker and the defender based on Nash equilibrium. Various FS methods have been proposed in recent publications to improve the performance of the classifiers employed. In [20], the authors have discussed major security issues existing for IoT networks and state-of-the-art solutions. In [21], it was found that filtering methods could lead to a misleading selection of features as filtering methods compute average scores on dataset classes and predict class labels accordingly. That may result in the non-selection of a feature that might be especially relevant for a class label. So, the authors proposed a multi-objective approach for FS. They have considered two objectives namely relevance and redundancy of class labels for FS. In this work, growing hierarchical self-organising maps is used, which is an unsupervised clustering machine learning method that combines a new unit labelling method. DARPA/NSL-KDD datasets are used to evaluate this method. They have concluded that their method produces an efficient determination of the wining unit as output and provides a maximum detection rate of 99.8 and 99.6% with normal and anomalous traffic, respectively. In [22-28], the authors have proposed a FS method based on limited criteria using the NSGA-II for network anomaly detection and pattern classification. They have evaluated their work in terms of classification accuracy and time of execution for different benchmark datasets. A FS wrapper method is proposed in [26] based on the single objective to maximise information gain for the detection of DDoS attacks using Bayesian networks (BN) and decision tree (C4.5) classifiers. Their method is evaluated on the KDD'99 dataset and DDoS dataset collected by Telecom Bretagne France on real-time computer networks. In this work, the authors found that massive network traffic data work high-speed IDS is challenging for efficient processing. Based on the work, they found that only important features should be used for the detection of the attack. A similar procedure is proposed in [27] where two wrapper methods of FS named RF-FSE and RF-BER have used IDS with a decision tree machine learning classifier. In their work, four objectives were used. They evaluated proposed methods on three benchmark datasets. In this work, they have used an RF classifier with CV Parameter Selection (CV)-parameter selection methods to validate the performance of the proposed algorithm. In [28], an NSGA-III, which improves NSGA-II with reference points, is proposed for FS exclusively by IDS to reduce computational complexity and improve the accuracy of the classifier focusing on the imbalance class problem of learning classifiers. The Jaccard index has been used for measuring the performance of their method on three benchmark datasets: NSL-KDD, KDD'99, and Cure-KDD. A jumping gene adapted NSGA-II proposed in [10, 12], which is inspired by real transposons present in DNA, can jump in and out of chromosomes for the optimisation problem of an industrial low-density polyethylene tabular reactor by employing multi-objective optimisation algorithm with two conflicting objectives. Different variants of jumping gene-based NSGA-II, such as NSGA-II-mJG, NSGA-II-saJG, NSGA-II-aJG, and NSGA-II-sJG [23], have been investigated. It is concluded in [12] that the NSGA-II-ajG performs better than the other two algorithms in terms of computation and convergence. A DDoS attack detection method based on semi-supervised learning for an IoT network is proposed in [29] using an ELM classifier. They have used the NSL-KDD and KDDCUP'99 datasets for evaluating their algorithm, which provides better performance in comparison with the centralised detection of attack framework in terms of accuracy. They have achieved a maximum accuracy of 86.53% with a deduction in runtime by 11 ms. In this study, we have implemented jumping gene adapted NSGA-II based on six important objectives namely maximise relevance, minimise redundancy, minimise the number of features, maximise classifier accuracy, maximise recall, and maximise precision. Our proposed method has achieved 99.90% accuracy, which is evaluated on CICIDS2017 datasets. 3 Methodology The methodology followed in this study is shown in Fig. 2. Network data with and without DDoS attacks are collected and normalised. These data are fed to the NSGA-II with six important objectives, which must be satisfied by a FS method, are explained in the section. The ELM classifier is trained with attacked and normal data and after this evaluation is validated by the classifier. Fig. 2Open in figure viewerPowerPoint Methodology of the proposed work 3.1 Dataset The CICIDS2017 dataset has been used to conduct this work. The IDS and internet protocol are considered the most important tools against the ever-growing network attacks, but mostly they lack in providing consistent and accurate performance due to the lack of reliable test and validation datasets. Most of the datasets, which include DDoS attacks, are out of relevant data that are unreliable. Dataset design suffers because of many reasons such as lack of traffic diversity, do not contain all known attacks, and include anonymised packet payload data, which does not provide current trends. The most common datasets used in other proposed work such as NSL-KDD and KDD-99 have shown limitations such as low detection rate, low true alarm, and high false positives. The CICIDS2017 labelled dataset available, which contains the most up to date data network attacks resembling real-world network data. This dataset is generated by keeping realistic background traffic as a top priority; the developer of this dataset has used a B-Profile system to profile the abstract behaviour of human interactions and generated naturalistic background traffic. Abstract behaviour of 25 users based on the hyper text transfer protocol, hyper text transfer protocol secure, frame per second (FPS), secure shell, and email protocols was built. This dataset was collected for five days in 2017 [30, 31] on different cyber-attacks along with no attacks. To evaluate our work, we have used data captured on 7 July 2017, which contains both normal and DDoS attack data. This dataset contains 85 network flow features along with label attributes and a total of 225,742 instances with both attack and normal data. This dataset is highly unbalanced so for this work we have modified the training dataset to balance in terms of both attack and normal data and reduced the number of instances to 81 features and divided data into training and test data. The transformed data are then normalised before it is fed to the ELM classifier algorithm. The features are normalised in range {−1, 1} and target attribute is normalised in range {0, 1}. Fig. 3Open in figure viewerPowerPoint NSGA-II procedure [30] 3.2 Multi-objective optimisation-based FS The multi-objective optimisation problem is defined as a method to find solutions for two or more conflicting objectives with some constraints. Optimisation with M number of objectives can be formulated as Minimise/Maximise {f 1(x), f 2(x),…, fM (x)} subject to x ∈ X where X is a set of solutions and x is a non-dominated solution. In other words, x1 is said to be Pareto-efficient if there exists x2, which is dominated by x1, if x1 is no worse than x2 for all M. x1 is better than x2 in at least mi (mi ∈M for i = 1, 2,…, M). The NSGA-II is presented in Fig. 3 and has the following procedure steps: (i) The population is initialised, crossover and mutation are performed on the population to produce offspring. Parents and offspring are combined after this non-dominated sorting is applied and classified by fronts. (ii) The new population is created according to fronts ranking. (iii) Crowding distance, which is based on the density of solutions around each solution, is calculated and assigned to each front. (iv) Tournament selection is performed to select next-generation offspring. Finally, a new generation is created by crossover and mutation operations. In this work, we have employed the NSGA-II-aJG illustrated in Fig. 4, for executing FS based on six different objectives. The jumping gene is a concept in which a randomly generated binary string is equal to the size of decision variables of the problem to be solved is used to replace a few chromosomes. The location to start the jumping gene replacement is chosen randomly with the condition that the chosen location is lower than the different total numbers of variables and chromosomes. Fig. 4Open in figure viewerPowerPoint Flow chart of the jumping gene adapted NSGA-II-aJG algorithm 3.3 ELM classifier machine learning algorithm The ELM classifier is a learning algorithm for single-hidden layer feed forward neural networks built on the idea that the input weights and hidden layer biases can be randomly assigned. The single hidden ELM has better generalisation performance than gradient-based methods, traditional support-vector machine (SVM), and least squares SVM has much faster learning speed [32], which is desired by wrapper FS methods. For the hidden layer, we have used a sigmoidal function as the activation function. K-fold cross-validation is repeated ten times and used for validation. 4 Objective functions Objective functions defined for evaluation are very critical for the FS of IDS. One of the reasons for the failure of IDS is a wrong selection of features based on which the classifier detects attacks. We have defined six important objectives that should be satisfied with the selection of features. (i) Maximise relevance: relevance is considered a very important criterion for selecting features, in [33-37], the authors have used relevance as the main parameter for reducing data dimensionality. For our work, we have used the relevance measure to be maximised as one of the objectives. Mutual information I (X; Y) is the amount of uncertainty in X to target Y. If H (X) and H (Y) are the entropy of X and Y, respectively. Relevance is formulated as symmetric uncertainty is defined as (1) where (2) S is a subset of X. Table 1. Parameter values for experiment population size 200 number of generations 100 number of variables 81 K-fold cross-validation number 10 ELM ensemble 10 number of units in ELM 50 Table 2. Subset of selected features with the highest accuracy No. of features Accuracy Relevance Recall Precision Redundancy 20 0.999 0.78 1.00 0.998 0.0653 19 0.999 0.74 0.99 0.998 0.0526 17 0.999 0.81 1.00 0.998 0.0329 12 0.999 0.74 1.00 0.999 0.0169 9 0.999 0.79 1.00 0.998 0.0040 6 0.999 0.79 1.00 0.998 0.0019 (ii) Minimise redundancy: redundancy for selecting features [33, 35, 36, 38] has been proved to be a very important parameter. Minimising redundancy in data could be defined as (3) (iii) Minimise number of features: the number of features within S represents the cardinality of the set. For lesser data, we expect a number of features to be as minimum as possible satisfying other objectives optimised (4) where | | denotes the cardinality of S. (iv) Maximise classifier accuracy: classifier accuracy could be formulated as (5) where tp, tn, fp, and fn stand for true positives, true negatives, false positives, and false negatives, respectively. (v) Maximise recall: recall [39, 40] is one of the very important measures for attack detection in computer networks. The only accuracy gives the percentage of attack detection, but on its own cannot promise the correct detection of attacks as the number of false-positive and false-negative could be high. The recall is a fraction of relevant instances that are retrieved from the data. We expect recall value to be maximised. (6) (vi) Maximise precision: similar to recall precision [41], it is also an important measure for attack detection. High precision value proves the correctness of detection of the attacks, and it can be defined as the fraction of retrieved instances that are actually relevant. We expect precision to be maximised (7) 5 Results and discussion Experimentation in our work is done according to the methodology explained in Section 3. Evaluation of the proposed method is conducted in MATLAB R2017a on 64-bit Intel® Core™ i5-4690 CPU @3.50 GHz with 16 GB RAM in Windows 7 environment. Multi-objective optimisation produces results as a set of Pareto-front according to the objective functions defined to be maximising or minimising. In our work, we have set accuracy, relevance, recall, and precision to be maximised and the number of features and redundancy to be minimised using an ELM classifier as the binary classifier algorithm. Parameters' values for conducting our work are shown in Table 1. To optimise the performance of the method, the population size should be big for a large number of features in the data, so in this work, the population size is set to 200. The number of iterations is 100, the total number of features present in our dataset is 81, including the label attribute. For the ELM classifier, the cross-validation number is set to be 10, and the number of units in ELM is considered as 50. Fig. 5Open in figure viewerPowerPoint Accuracy versus number of features Fig. 6Open in figure viewerPowerPoint Precision versus number of features Fig. 7Open in figure viewerPowerPoint Recall versus feature size Fig. 8Open in figure viewerPowerPoint Redundancy versus feature size Fig. 9Open in figure viewerPowerPoint True negative versus feature size Fig. 10Open in figure viewerPowerPoint False-positive rate versus accuracy Fig. 11Open in figure viewerPowerPoint Occurrence of features The number of solutions obtained as Pareto-fronts in our work is more than 700 satisfying the six objective functions defined. Table 2 shows the obtained best subsets satisfying all six defined objectives, which have the same highest accuracy with a different number of selected features using the proposed method. Fig. 5 illustrates the comparison of accuracy achieved against a number of selected features in a subset. The best accuracy we have achieved is 99.9%, and the least is 36.0% with different subset sizes. The least subset size we obtained is 2 with 61.0% accuracy. The subset with minimum cardinality, highest accuracy, and value of other objectives defined are s1 = {1, 7, 40, 47, 53, 62}, s2 = {1, 7, 17, 33, 46, 47, 53, 55, 62}, and s3 = {1, 7, 17, 46, 47, 53, 62}. The subset size having the highest accuracy value is six selected features with 99.9% accuracy, the value of relevance, recall, precision, and redundancy is 79.00, 100, 99.80, and 0.19%, respectively. Another best subset we obtained has nine selected features and values of relevance, recall, precision, and redundancy as 79.00, 100, 99.80, and 0.40%, respectively, which are the same as
Referência(s)