Highly efficient neoteric histogram–entropy‐based rapid and automatic thresholding method for moving vehicles and pedestrians detection
2019; Institution of Engineering and Technology; Volume: 14; Issue: 2 Linguagem: Inglês
10.1049/iet-ipr.2018.5555
ISSN1751-9667
AutoresK. Chandrasekar, Planisamy Geetha,
Tópico(s)Video Surveillance and Tracking Methods
ResumoIET Image ProcessingVolume 14, Issue 2 p. 354-365 Research ArticleFree Access Highly efficient neoteric histogram–entropy-based rapid and automatic thresholding method for moving vehicles and pedestrians detection Karnam Silpaja Chandrasekar, Corresponding Author Karnam Silpaja Chandrasekar silpajachandrasekar@gmail.com orcid.org/0000-0003-3123-8057 Department of Computer Science and Engineering, Anna University, Guindy, Chennai, Tamilnadu, IndiaSearch for more papers by this authorPlanisamy Geetha, Planisamy Geetha Department of Computer Science and Engineering, Anna University, Guindy, Chennai, Tamilnadu, IndiaSearch for more papers by this author Karnam Silpaja Chandrasekar, Corresponding Author Karnam Silpaja Chandrasekar silpajachandrasekar@gmail.com orcid.org/0000-0003-3123-8057 Department of Computer Science and Engineering, Anna University, Guindy, Chennai, Tamilnadu, IndiaSearch for more papers by this authorPlanisamy Geetha, Planisamy Geetha Department of Computer Science and Engineering, Anna University, Guindy, Chennai, Tamilnadu, IndiaSearch for more papers by this author First published: 08 January 2020 https://doi.org/10.1049/iet-ipr.2018.5555Citations: 3AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Thresholding for segmentation is an important key step and necessary process in various applications. Estimating an accurate threshold value for a complex and coarse image is computationally expensive and lacks accuracy and stability. This study is aimed at developing a general histogram–entropy-based thresholding method, referred as our HEBT method, for fast and efficient automatic threshold value evaluation. In the proposed method, the probability density function and Shannon entropy derived from 1D bimodal histogram have been used to find the optimal threshold values automatically. The proposed method implemented with a three-frame differencing segmentation technique has been tested on real-time datasets – change detection 2012, change detection 2014, and Wallflower – to identify pedestrians and vehicles. The performance of our HEBT method has been compared with six state-of-the-art automatic thresholding methods. The experimental segmented image results confirmed that our HEBT method is more adaptable and better suited for real-time systems with severe challenging conditions of great variations. Further, the new HEBT method achieved the best segmentation results with highest values of several performance parameters, i.e. recall, precision, similarity, and f-measure. Interestingly, the computation time is the lowest for the proposed method than the state-of-the-art methods, promising its application for a fast and effective image segmentation. 1 Introduction Thresholding is an important step in the segmentation of foreground pixels from the background region. Thresholding has been a popular technique in object segmentation of grey-scale images. The basic idea of thresholding is to select an optimal threshold value and to extract foreground objects from grey-level ones by turning all pixels above the selected threshold value to one and all pixels below the selected threshold value to zero. The outcomes of clear binary segmented image generation are highly dependent on the method for precise and appropriate threshold value selection. In most reported research on thresholding techniques, manually set threshold values are commonly used [1-3]. The user-defined fixed threshold method is a technique that uses the threshold manually by locating the pixel values that are irrelevant to the current process of segmenting foreground from the background. Note that this method requires manual involvement and the selected threshold can be suboptimal or even incorrect if there is a bias due to limited observations or when there are unexpected changes in pixels or large fluctuations in lighting. For this reason, automatic thresholding is more desirable in determining the optimal threshold for object segmentation. Image thresholding methods have been categorised into six groups by Seizgin and Sankur [4] on the basis of the information that is used by the methods: histogram based (based on peaks and valleys), clustering based (where the grey levels are clustered into two groups) [5-7], entropy based (based on the entropy of foreground and the background), object-attribute (based on the measure of similarity between the grey level and binary images) [8-10], spatial (based on the correlation of the pixels), and local methods (adapts the threshold value on each pixel) [11, 12]. Among all the classified methods, the histogram–entropy-based threshold (HEBT) methods have drawn the attention of several researchers for the threshold-based detection application, and many algorithms have been proposed. The hierarchy of threshold methods is shown in Fig. 1. The highlighted boxes focus on the thresholding method developed in the present work. Fig. 1Open in figure viewerPowerPoint Hierarchical representation of threshold methods Histogram-based thresholding is quiet an easy technique when compared to other methods. A histogram is a graphical representation of intensity and pixel count data. The histogram-based thresholding method analyses the histogram information, such as peaks and valleys to estimate the threshold value. Many approaches have been proposed to estimate the threshold value using the upper bound of the posterior entropy [13], maximum posterior [14], and considering the deepest concavity point [15] of the 1D histogram. However, these methods are computationally expensive. Moreover, they have identified inequality constraints of uniformity in segmented regions. To overcome these problems, many other methods have been proposed to estimate the threshold value from exponential hull of 1D histogram [16] and probability mass function of the pixels [17]. However, these methods estimated unstable threshold values because of the less bimodality in overlapping class histogram. A new method has been proposed [18] to estimate the threshold value from the two homogeneous regions of the foreground and background pixels, but failed to show better accuracy for heterogeneous images. To address such a problem, two-dimensional (2D) histogram thresholding methods have been proposed by various researchers. However, the 2D histogram-based multi-level image thresholding segmentation methods are computationally high in cost, since they involve complex optimisation process [19]. To overcome this problem, meta-heuristic algorithms on 2D histograms have been proposed by many researchers. Another new attempt to estimate the threshold value from edge information using the grey-gradient 2D histogram showed high computational cost due to exhaustive search for the optimal threshold value [20]. An approach based on the bounded search technique [21] and multilevel thresholding based on the Otsu and differential algorithm [22] have been proposed to reduce the cost. A new 2D multi-level thresholding has been proposed by Ishak [23], which involves evolutionary algorithms. The combination of non-local means 2D histogram and K-means gravitational search algorithm to determine the optimal threshold value has shown a better performance than the other 2D histogram-based segmentation method [24]. The performances of 1D histogram methods are better and satisfactory than 2D histogram as they do consider all the grey-level pixels, which reduce the probability of misclassification of pixels. With the advances in image processing, the entropy-based thresholding algorithm has achieved popularity among other methods for it being the less computationally expensive. Entropy is defined as a cost function that represents the measure of disorder in the randomness of the image. In an image, the homogeneous region corresponds to minimum entropy while the non-homogeneous region defines maximum entropy. Therefore, it is concluded that the high entropy of the image represents better separation between regions. Thus, entropy-based threshold methods serve as an efficient objective function to estimate the threshold values. There has been a moderate success achieved with several entropy-based methods, such as the Renyi entropy [25, 26], Shannon entropy [26, 27], and cross entropy [28-30]. Brink and Pendock [31] proposed an approach to calculate the threshold value by minimising the cross entropy of the pixel values. An adaptive particle swarm optimisation with the Shannon entropy has been proposed by Qi [32] to measure the information contained in the image. 2D Tsallis–Havrda–Charvt entropy has been introduced for image thresholding segmentation method [33]. A new method using 2D Renyi's entropy has been proposed by Sahoo and Arora [34] to estimate the threshold value. A 2D Renyi's entropy using the adaptive artificial fish-swarm algorithm [35] has been proposed. Further, due to the complexity of 2D Renyi's entropy, a new image thresholding segmentation method based on 2D Renyi's entropy and fuzzy clustering has been proposed [36]. Another colour image multilevel thresholding approach, proposed by Pare et al. [37], is a combination of modified fuzzy entropy and entropic parameters. Another new approach for colour image multilevel segmentation strategy is based on the Bat algorithm and Renyi's entropy [25]. However, the histogram-based and the entropy-based two methods, when applied separately, have lost some of the fine details of the images such as points, lines, diagonal information, and edges. To overcome this problem, the combined histogram–entropy-based methods are widely developed [38-41]. Due to their simplicity and capacity to deal with the spatial correlation among the pixels, the combined histogram–entropy methods are considered to be efficient [24]. Histogram–entropy-based combined thresholding methods are effective for most real-time videos, but a few shortcomings remain unsolved. The main reason is because the histogram thresholding method utilises the distribution of grey-level pixels and the entropy thresholding utilises the randomness of an image only. As a result, different images with the same histogram may have the same threshold, which is not correct. To overcome these drawbacks, many research studies developed new HEBT methods. Xiao et al. [38] proposed a grey-level spatial correlation (GLSC) histogram, which has been constructed from the grey value of pixels and their similarity with neighbouring pixels. A new 2D direction histogram proposed by Yimit et al. [42] has estimated the threshold value from the local edge property. The above methods showed high exponential increment of computational time for large sized images. After analysing the drawbacks of GLSC and 2D direction histogram, Xiao et al. [39] proposed an entropic-histogram-based thresholding method by estimating the grey-level occurrence probability and spatial distribution simultaneously. This method has been based on the gradient-magnitude histogram (GLGM). However, the GLGM method showed a negative effect on recognising the image edges. Further, a grey-level-local fuzzy entropy (GLLFE) histogram has been proposed by Zheng et al. [40] to estimate the threshold value by transferring the pixels grey level to a fuzzy set through a fuzzy membership function and by combining the local fuzzy entropy and grey-level pixels. Another approach by combining the grey-level histogram and local-entropy information to estimate the threshold value has been introduced by Chen et al. [41]. However, these methods showed less performance due to the lack of gradient orientation. To overcome this problem, Yimit and Hagihara [43] proposed a new method using Renyi's entropic multi-level thresholding method based on the 2D histogram to compute the optimal threshold value. However, this method fails to handle illumination changes in the images. Borjigin and Sahoo [44] computed the threshold value for RGB-based colour image by grey-level and local-average histogram and Tsallis–Handra–Charvat entropy method. Yi et al. [45] proposed a new entropic thresholding method based on the 2D histogram constructed using a Gabor filter. However, this method ignored the side of the diagonal information, which resulted in the loss of many useful information. Another generalised entropy-based thresholding method based on Masi entropy dealing with the additive and non-extensive information through the entropic parameter has been proposed by Shubham and Bhandari [46]. Table 1 shows the methods and drawbacks of the state-of-the-art methods. Table 1. Comparison of state-of-the-art methods for HEBT methods Types of thresholding Ref. Method Limitation histogram based [13] posterior entropy computationally expensive [14] maximum posterior inequality distribution of pixels [15] deepest concavity point [16] exponential hull of 1D histogram 3* less bimodality in overlapping class histogram [17] probability mass function of the pixels [18] two homogeneous regions of the foreground and background pixels only for heterogeneous images [20] grey-gradient 2D histogram high computational cost [21] bounded search technique low sensitivity [22] Otsu and differential algorithm fails to solve the optimisation problems [23] evolutionary algorithms [24] non-local means 2D histogram and K-means gravitational search algorithm entropy based [31] minimising the cross entropy of the pixel values [32] particle swarm optimisation with the Shannon entropy loss of information [33] 2D Tsallis–Havrda–Charvt entropy sensitive to the noises [34] 2D Renyi's entropy long processing time [35] 2D Renyi's entropy using the adaptive artificial fish-swarm algorithm [36] 2D Renyi's entropy and fuzzy clustering [37] modified fuzzy entropy and entropic parameters [25] Bat algorithm and Renyi's entropy histogram–entropy based [38] image local property in a different way by GLSC histogram exponential increment of computational time [42] local edge property in 2D direction histogram [39] grey-level occurrence probability and spatial distribution based on GLGM high exponential increment of computational time for large sized images [40] GLLFE by combining the local fuzzy property not applicable for multi-level thresholding [41] grey-level histogram and local-entropy information ignores the side of the diagonal information Thus, it may be realised from the above discussion that no efficient thresholding method is designed for background subtraction method. Moreover, many threshold-based segmentation techniques do not consider the images under illumination changes or noises. In a recent work concerning the detection of moving targets, a threshold method utilising a manually set threshold value was used [2]. As the user-defined fixed threshold method has severe limitations [1-3], in the present work, we propose a new combined HEBT method to automatically and rapidly select an optimal threshold for effective segmenting objects of interest in a grey-scale image. Therefore, the overall contributions of this study include: (i) applying 1D bimodal histogram to calculate prior probabilities of the object and the background pixels of grey-scale image; (ii) applying the Gaussian distribution function to 1D bimodal histogram for misclassification error removal, and to calculate the probability density function; (iii) calculating the Shannon entropy of the grey-scale image and its application to reduce the computational time in estimating most probable and accurate threshold value. To validate the efficiency of our HEBT method, extensive experimental analysis is done using the newly developed HEB threshold method on real-time moving vehicles and pedestrians datasets – change detection 2012 dataset containing 25 videos [47] (hereafter referred as CD-2012), change detection 2014 dataset containing 22 videos [47] (hereafter referred as CD-2014), and Wallflower dataset containing 7 videos [48] – and the performance is compared with six state-of-the-art histogram–entropy-based automatic thresholding techniques [40, 41, 43-46]. In these comparison studies, all the thresholding methods including our own HEBT method, are implemented with the three-frame differencing segmentation method (the one used in the work of [2]) for object detection from complex and coarse images. In another set of experimental analysis, the performance of the HEBT method (implemented with the three-frame differencing segmentation method) is compared with several published works [49-58], which used the same datasets, i.e. CD-2012, CD-2014, and Wallflower, but employed different thresholding and object segmentation methods. The segmented results are illustrated as output images, and analysis result evaluations are based on five performance parameters of segmentation: average recall, average precision, average similarity, average f-measure, and computational time. Qualitative and quantitative assessments of results indicate that the proposed method shows high efficiency, fast detection, and best performance. The rest of the paper is organised as follows: Section 2 describes the objective of the paper, Section 3 describes the proposed method, and Section 4 shows the experimental results and analysis. Finally, Section 5 gives the conclusion and future work. 2 Objective of the proposed method From the insight drawn from the thresholding field, the need for the proposed method is listed as follows: The number of misclassified pixels is reduced if we have some prior knowledge about the background and object pixels; hence, the histogram-based thresholding is used in the proposed method to estimate the prior pixels of object and the background region. Thresholding methods are computationally expensive, so that there is a need of optimisation techniques which optimise the objective function to reduce the computational time; hence, the highly efficient Shannon entropy method is used here. It is highly essential to preserve the information contained in the image, such as edges, diagonal information, and corners. In order to achieve this prerequisite and to reduce the computational complexity, a new combined HEBT method is proposed in this work that resolves these issues to some extent. The hierarchy of the issues and solutions of the proposed method is shown in Fig. 2. Fig. 2Open in figure viewerPowerPoint Hierarchy of the issues and solutions for the proposed work 3 Proposed method This section presents our proposed combined HEBT method for estimating the optimum threshold value automatically and rapidly, of which the overall work flow is illustrated in Fig. 3. The two components of the HEBT method include calculating (i) the probability density function and (ii) the Shannon entropy for the individuals, i.e. object and the background region of the grey-scale image. To accomplish this, we construct the bimodal histogram to estimate the prior probabilities of the object and the background pixels followed by applying the Gaussian density function in each region to calculate the probability density function of the individuals and the overall image. The probability information obtained from the histogram is used in the second step to estimate the Shannon entropy of the object, the background region, and the overall image. Finally, by combining the probability density function and the entropy of the overall image, we get an accurate threshold value for each individual image. Fig. 3Open in figure viewerPowerPoint Block diagram of the proposed HEBT method 3.1 Computing probability density function for each grey-level image As grey-level image contains two intensity regions, namely object and the background, a bimodal histogram is constructed based on the intensity values of the object and the background region. The bimodal histogram with two peaks represents the pixel values appearing frequently in a particular region. The prior probabilities of the object () and the background () pixels are estimated from the histogram by using Otsu method [59]. It may be noted that the prior probabilities, in general, include the misclassified object pixel as background pixel or either wise, which must be reduced for calculating the accurate threshold value. This is achieved by applying the Gaussian distribution function to the pixel intensities of the object () and the background () region. The misclassification error can be further reduced by summing the products of prior probabilities and Gaussian distribution for the object and the background region, as shown below. Assuming that the intensity values can be modelled as a random variable, the histogram of the particular frame is considered as the probability density function of the random variable x. Therefore, the histogram is viewed as the probability density function of the intensity value x, expressed as , given in (1): (1)Equation(2) shows the overall probability density function of the image implemented with the Gaussian distribution function in detail: (2)where and are the means, and and are the standard deviations of the intensity values of object and the background regions, respectively. 3.2 Computing Shannon entropy of each grey-level image Shannon entropy of the object and the overall image region have been calculated and used to reduce the computational time (Tables 2–4). Entropy uses the uncertainty as the measure of information of the image. Thus, treating the histogram of the grey-scale image (with pixels ranging from 0 to 255) as an information source, the object pixels range from 0 to m and background pixels range from to j (255). Let x be the discrete random variable with elements for the object pixels. Then the probability mass function of object pixel, , is represented as in (3): (3)where represent the probability of each intensity value x of object region and is the total probability of the frame. Similarly, the probability mass function of background pixel, , is given as in (4): (4)where represent the probability of each intensity value x of the background region. Since represents the total probabilities associated with the partitioning the histogram as the object and background pixels, it is given as in (5): (5)Further, the Shannon entropy of the object region is estimated as in (6): (6)Similarly, the Shannon entropy of the background region is calculated as in (7): (7)The Shannon entropies of the object and the background pixels are combined to estimate the entropy of the entire image, expressed as , as in (8): (8)Finally, to preserve the information contained in the images, such as points, corners, and edges, the probability density function and the entropy of the overall frame are combined to yield an exact threshold value, T, for each individual frame as in (9): (9)The proposed HEBT algorithm for computing the threshold value is shown in Algorithm 1 (see Fig. 4). In the algorithm, the following notations are used: – frame; – grey-scale frame; n – frame number with n = 1,2,…,n. Table 2. Comparison of performance parameter values obtained with our HEBT and six other HEBT methods on the CD-2012 dataseta Methods Recall Precision Similarity f-measure Time, s Yimit and Hagihara [43] 0.7689 0.5563 0.632 0.6456 39.3 Chen et al. [41] 0.4276 0.4667 0.354 0.1996 18.0 Zheng et al. [40] 0.6135 0.4873 0.296 0.5432 12.6 Shubham and Bhandari [46] 0.6027 0.5632 0.439 0.5823 10.01 Borjigin and Sahoo [44] 0.3486 0.7980 0.427 0.4852 42.6 Yi et al. [45] 0.7475 0.5893 0.296 0.6590 16.3 HEBT 0.8942 0.8854 0.8675 0.8887 7.3 a Implemented along with three-frame differencing segmentation method. Table 3. Comparison of performance parameter values obtained with our HEBT and six other HEBT methods on the CD-2014 dataseta Methods Recall Precision Similarity f-measure Time Yimit and Hagihara [43] 0.7832 0.7259 0.551 0.7535 15.6 Chen et al. [41] 0.6892 0.5935 0.413 0.6377 18.4 Zheng et al. [40] 0.7256 0.6823 0.406 0.7032 10.2 Shubham and Bhandari [46] 0.8392 0.5817 0.326 0.6871 23.4 Borjigin and Sahoo [44] 0.7828 0.4102 0.212 0.5383 15.3 Yi et al. [45] 0.9019 0.7182 0.329 0.799 22.1 HEBT 0.9286 0.7656 0.7743 0.8393 8.3 a Implemented along with three-frame differencing segmentation method. Table 4. Comparison of performance parameter values obtained with our HEBT and six other HEBT methods on the Wallflower dataseta Methods Recall Precision Similarity f-measure Time, s Yimit and Hagihara [43] 0.4582 0.7756 0.6271 0.5761 61.2 Chen et al. [41] 0.5364 0.7692 0.4312 0.6320 49.3 Zheng et al. [40] 0.5514 0.8742 0.3541 0.6763 46.1 Shubham and Bhandari [46] 1 0.5441 0.4276 0.7047 36.1 Borjigin and Sahoo [44] 0.4215 0.5767 0.5532 0.4870 22.5 Yi et al. [45] 0.6372 0.7612 0.4623 0.6937 11.6 HEBT 0.8742 0.8965 0.632 0.8853 9.4 a Implemented along with three-frame differencing segmentation method. Fig. 4Open in figure viewerPowerPoint Algorithm 1: HEBT method 4 Experimental results and analysis To evaluate the effectiveness of the proposed HEBT-based background subtraction method, extensive videos are used to conduct the experiments. One dataset that has been used is the CD-2012 dataset [47], which provides 31 videos grouped into six challenging scenario videos, such as baseline, dynamic background, intermittent object motion, camera jitter, thermal, and shadow. We have chosen five scenario videos: thermal, shadow, camera jitter, dynamic background, and baseline. The next is the CD-2014 dataset [47], which consists of 22 videos grouping into five scenarios, including bad weather, night videos, low frame rate, Pan/Tilt/Zoom, and air turbulence. The Wallflower dataset [48] contains seven videos. All these datasets demonstrate real scenarios, which provided appropriate opportunity for a more critical and comprehensive evaluations for the moving object detection tasks. All the experiments run on a 64-bit Windows 8.1 platform with Intel core i7-4790 CPU @ 3.60, 3.61, and 14.0 GB RAM. The algorithms have been implemented in Matlab. The efficiency of the proposed HEBT method has been assessed by comparing it with six other state-of-the-art histogram–entropy-based automatic threshold methods, where all the experiments have been implemented with the three-frame differencing segmentation algorithm. The six state-of-the-art methods studied are: a GLLFE histogram method [40], a grey-level-histogram and local-entropy information method [41], Renyi's entropic multi-level thresholding method based on a 2D histogram [43], a grey-level and local-average histogram along with Tsallis–Handra–Charvat entropy method [44], a new entropic thresholding method based on the 2D histogram constructed using a Gabor filter [45], and a generalised entropy-based thresholding method based on Masi entropy [46]. The comparisons between the HEBT method and the other state-of-the-art methods are made in terms of segmented images and performance parameters: average recall, average precision, average similarity, average f-measure, and computation time. There are four basic metrics to evaluate the performance parameters: True Positive (TP) is the total number of pixels classified as foreground correctly; False Positive (FP) is the total number of pixels classified wrongly as foreground; True Negative (TN) is the total number of pixels classified as background correctly; False Negative (FN) is the total number of pixels classified wrongly as background. The corresponding performance parameters are calculated as follows: Average recall = TP/(TP + FN) Average precision = TP/(TP + FP) Average similarity = TP/(TP + FP + FN) Average f-measure = (2 Precision Recall)/(Precision + Recall). 4.1 Experimental results on CD-2012 dataset The CD-2012 dataset contains 31 videos which are grouped into six categories. We have used five categories for a comparison with the proposed HEBT method. The five challenges are described as follows: Thermal: The videos are captured using infrared cameras. The main problem of the videos is all the objects are the same coloured. Shadow: These test videos contain shadows by blocking the light heavily and changing the hue and saturation, which makes the shadow suppression difficult. Camera jitter: Images in these videos vibrate frequently and camera shakes heavily making it difficult to extract the foreground. Dynamic background: This category contains rippling water, fountain, canoes, and waving trees that describe the multimodal distribution of the background. Baseline: There are four videos in baseline which show motion detection and some isolated shadows. Qualitative results: For evaluation purpose on a qualitative level, we have randomly selected frame #3207 in Corridor, frame #2397 in Library, frame #318 in Park, frame #1172 in Dinning room, frame #1172 in Lakeside, frame #1434 in Backdoor, frame #470 in Bungalow, frame #543 in Bus, frame #3558 in Cuboid, frame #862 in peopleInshade, frame #1115 in Copy machine, frame #864 in Badminton, frame #1981 in Boulevard, frame #800 in Sidewalk, frame #964 in Traffic, frame #7527 in Boats, frame #960 in Canoe, frame #1127 in Fountain 01, frame #7245 in Fountain 02, frame #2505 in Ov
Referência(s)