Vehicle counting and traffic flow parameter estimation for dense traffic scenes
2020; Institution of Engineering and Technology; Volume: 14; Issue: 12 Linguagem: Inglês
10.1049/iet-its.2019.0521
ISSN1751-9578
AutoresShuang Li, Faliang Chang, Chunsheng Liu, Nanjun Li,
Tópico(s)Autonomous Vehicle Technology and Safety
ResumoIET Intelligent Transport SystemsVolume 14, Issue 12 p. 1517-1523 Research ArticleFree Access Vehicle counting and traffic flow parameter estimation for dense traffic scenes Shuang Li, School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this authorFaliang Chang, School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this authorChunsheng Liu, Corresponding Author liuchunsheng@sdu.edu.cn School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this authorNanjun Li, School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this author Shuang Li, School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this authorFaliang Chang, School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this authorChunsheng Liu, Corresponding Author liuchunsheng@sdu.edu.cn School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this authorNanjun Li, School of Control Science and Engineering, Shandong University, Ji'nan, 250061 People's Republic of ChinaSearch for more papers by this author First published: 01 December 2020 https://doi.org/10.1049/iet-its.2019.0521Citations: 1AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onEmailFacebookTwitterLinked InRedditWechat Abstract The vision-based traffic flow parameter estimation is a challenging problem especially for dense traffic scenes, due to the difficulties of occlusion, small-size and dense traffic etc. Yet, previous methods mainly use detection and tracking methods to do vehicle counting in non-dense traffic scenes and few of them further estimate traffic flow parameters in dense traffic scenes. A framework is proposed to count vehicles and estimate traffic flow parameters in dense traffic scenes. First, a pyramid-YOLO network is proposed for detecting vehicles in dense scenes, which can effectively detect small-size and occluded vehicles. Second, the authors design a line of interest counting method based on restricted multi-tracking, which counts vehicles crossing a counting line at a certain time duration. The proposed tracking method tracks short-term vehicle trajectories near the counting line and analyses the trajectories, thus improving tracking and counting accuracy. Third, based on the detection and counting results, an estimation model is proposed to estimate traffic flow parameters of volume, speed and density. The evaluation experiments on the databases with dense traffic scenes show that the proposed framework can efficiently count vehicles and estimate traffic flow parameters with high accuracy and outperforms the representative estimation methods in comparison. 1 Introduction For some traffic management and intelligent traffic system (ITS), it is important to be fast and accurate while estimating traffic flow parameters including volume, speed and density [1]. The vehicle counting is a key task in traffic flow parameters estimation and it is also hard in dense traffic scenes. Some sensor-based methods [2, 3] which rely on sensing devices can also process the counting and parameters estimation. However, installing enough sensing devices is expensive and may fail to capture the slow-moving vehicles. The vision-based methods [4] can avoid these disadvantages. However, vision-based vehicle counting in dense traffic scenes also has problems to solve, such as occlusions, small-size, complex background etc. In dense traffic scenes, counting vehicles in a detection region, named region of interest (ROI) counting, is usually performed and the detection result is used to estimate traffic density directly. Another existed vehicle-counting framework is the line of interest (LOI) counting, which counts vehicles crossing a counting line in a certain time duration; and is usually used to estimate traffic volume directly. However, LOI counting is always performed in free traffic in existing vehicle-counting methods. For the three traffic parameters (volume, speed, density), most of the existing counting methods can only estimate one or two traffic parameters [5], and often indirectly estimate the remaining parameters based on the relationship among these three parameters [6]. In dense traffic scenes, the density can be estimated by the detection methods [7, 8]. Yet, these methods can not directly estimate the parameter of traffic volume, because traffic volume estimation is more accurate based on LOI counting. The LOI-counting method based on vehicle detection, tracking [9, 10] and counting is a suitable method. In this method, all three parameters can be directly estimated. We propose a framework based on vehicle detection, tracking, counting and parameters estimation. In addition, there are three main contributions including accurate and fast vehicle detection, accurate dense vehicles tracking and traffic flow parameter estimation. Detection of vehicles is often the first task to count vehicles. For dense traffic scenes, the small-size and occluded vehicles bring difficulty to the detection process. YOLOv3 [11] is a fast and accurate convolution network, yet may miss some vehicles that are of small-size or occluded. In this study, we propose a pyramid-YOLO (PM-YOLO) to address the problem of vehicle detection in dense traffic scenes. The original image is first scaled into different scales to get pyramid feature maps. The YOLO detector is applied to detect vehicles in different scales. After the detection process, we design a post-processing step to merge the bounding boxes that PM-YOLO generates to improve the detection performance. In dense traffic scenes, there are usually a relatively large number of vehicles; tracking all vehicles in these scenes is usually a complex and inaccurate process. For the task of counting crossing vehicles, we propose a restricted multi-target tracking method. In our method, we only track short-term vehicle trajectories near the counting line, thus improving tracking accuracy and reducing tracking time. We first set a counting line and select vehicles to be tracked according to the priority value. Then we design the multi-target tracking based vehicle-counting method using the confidence multi-object tracking (CMOT) [12] and a trajectory processing method. Speed, volume and density are the three most important parameters of traffic flow. Most vehicle-counting methods fail to estimate all three traffic parameters in dense traffic scenes. We propose a traffic parameter estimation model to estimate all these three parameters based on the results of vehicle detection, tracking and LOI counting. The tracking and counting results can be used to directly estimate volume and speed, and the estimation results are more reliable. The parameter density can also be estimated based on the detection. The public UA-DETRAC data set [13] and our captured videos are used to evaluate the proposed vehicle detection and counting framework; it outperforms the representative vehicle-counting methods in comparison. The traffic parameters estimation model is evaluated on our captured videos and achieves good estimation results with low mean absolute percentage error (MAPE). 2 Related work Traffic flow parameters are important parameters for ITS. The vision-based traffic flow parameter estimation method is mainly based on vehicle counting. Density estimation is often achieved by vehicle detection or estimation of density maps. Volume estimation is often achieved by LOI counting and tracking results. Speed estimation is mostly based on vehicle tracking. In this study, we divide the related work into three categories: counting-based detection and regression; counting-based tracking and time-spatial image (TSI) analysis and traffic flow parameters estimation methods. Also, we review them as follows. In recent years, many deep learning networks have been used in vehicle counting. In [8], a multi-channel and multi-task convolutional neural networks (CNNs) was proposed to count vehicles from still images. Faster-RCNN [14], YOLO and SSD [15] were also used to detect vehicles before counting. In [7], Dai et al. presented a comprehensive comparison of some vehicle detection methods including You Only Look Once (YOLO), Single Shot MultiBox Detector (SSD) and Faster-Region Convolutional Neural Networks (R-CNN). All these deep frameworks have achieved good results in accuracy. Background subtraction (BS) methods can be used to extract and segment moving vehicles. Zhang et al. [16] proposed a Gaussian mixture model with confidence measurement for vehicle detection in complex urban traffic scenes. Chauhan et al. [17] proposed a linear quadratic estimation method for detecting vehicles and counting all detected vehicles. In [18], a counting method based on Gaussian mixture model and Kalman filter was proposed for background modelling and counting. Yet, when dealing with serious occlusions, the performance of these methods may fall rapidly, because it is difficult to build a good background model in a real dense scene. Without detection, the regression-based methods can map image pixels into vehicle counting and get ROI-counting results directly. In [19], a cascaded regression with a set of low-level foreground segmentation features was developed to count vehicles. Wang et al. [20] and Liu et al. [21] combined a spatial regression and a local temporal regression for vehicle counting. These regression methods need to extract informative features and may fail in the dense scene. Most LOI-counting methods rely on vehicle tracking [6, 9, 10] after detection. Some tracking-based LOI-counting methods utilised the binary large object (blob) analysis method [17, 22]. The blob method can find the new crossing vehicles based on the analyses of the changes of pixels or features. Zhao and Wang [10] and Ke et al. [6] performed LOI counting by using the Kanade–Lucas–Tomasi tacker to get vehicles' trajectories. In dense traffic scenes, tracking accuracy and speed may become lower. Some methods utilised BS algorithms on the TSI for LOI counting. In these methods, a virtual line (VL) was often set on the image perpendicular to the road first, then the traffic video was mapped to TSI by stacking pixels on the VL over time. In [23], the generated TSI was processed by a self-adaptive sample consensus background model and then counted vehicles. In [24], a method based on ROI accumulative curve method and the fuzzy constraints satisfaction propagation was designed to count vehicles in TSI. These processes on TSI are also based on BS methods; hence the performance of these methods may fall when dealing with serious occlusions or clustered scenes. After vehicle counting, some methods [5, 6, 25] further did the estimation of traffic flow parameters. A traffic flow estimation model was proposed based on ROI counting and tracking for aerial videos [6, 9]; yet, the volume was not directly estimated by the LOI-counting results. Using the number and speed of crossing vehicles, Gao et al. [5] estimated traffic flow changes over time during a week. Zhang et al. [25] only performed some traffic flow analysis based on ROI results. Hence, though there are many excellent-counting methods, there are a few vision-based methods that directly estimate all the three traffic flow parameters. 3 Proposed methods The overall framework of the proposed structure is shown in Fig. 1. It mainly consists of three parts: (i) PM-YOLO-based vehicle detection; (ii) restricted multi-tracking-based LOI counting; (iii) estimation model of different traffic flow parameters. First, a PM-YOLO network is proposed to detect vehicles in a different complex and dense traffic scenes. Second, we design the multi-target tracking and vehicle-counting method based on the CMOT and a trajectory processing process. Third, a parameter estimation model is proposed to estimate volume, density and speed in dense traffic scenes. Fig. 1Open in figure viewerPowerPoint Structure of the vehicle counting and traffic flow parameter estimation for dense traffic scenes 3.1 PM-YOLO-based vehicle detection YOLOv3 is a fast and accurate convolution network which can simultaneously get predicted bounding boxes and class probabilities. In this study, we propose a PM-YOLO to address the problems existing in dense traffic scenes such as small-size and occluded vehicles. We first scale the original image into different scales to get pyramid feature maps. YOLO detectors are applied to detect vehicles in different scales. After the detection process, we design a post-processing step to merge the bounding boxes that PM-YOLO detectors generate to improve the detection performance. The original image is first scaled into k ( k = 2) scales. With k pyramid scales and the original image, a pyramid network with k + 1 sub-nets is designed, in which each sub-net is a YOLOv3 network for different scales. Each YOLOv3 can predict objects at three scales, which means that YOLOv3 detector divides the input image into three different types of grids including G 1 × G 1 , G 2 × G 2 and G 3 × G 3 . Each grid cell has the responsibility of detecting objects whose centre is in that grid cell. Each grid cell can predict three boxes. Hence, for each sub-net in our PM-YOLO, the number of YOLOv3 anchors is 3 × 3, which is the multiplication of the type number of grids and the number of predicted boxes in each grid cell. The number of predicted bounding boxes is G 1 × G 1 + G 2 × G 2 + G 3 × G 3 × 3, which is the multiplication of the number of grids and the number of predicted boxes in each grid cell. These bounding boxes are analysed and selected to get final detection results. The K -means cluster method is utilised to determine bounding box priors for YOLO anchors. The anchors can be obtained based on K -means clustering. In this work, the number of anchors is 9, which means each cell in each scale can predict three boxes using three anchors. Although the YOLOv3 network in each pyramid scale has the same structure, the parameters in each network are different. The input size of YOLOv3 net-1 is equal to the original resolution; the anchor value of this branch is equal to the clustering result of K -means clustering. With half input size, the anchor value of YOLOv3 net-2 is half of the anchor value of YOLOv3 net-1. Different YOLOv3 networks work in parallel to deal with different scaled images. The networks are independent of each other. The detection results are mapped to the original image to get the final results. One object may have some bounding boxes. The non-maximum suppression (NMS) method is utilised to eliminate repeated bounding boxes. After NMS merging, we just reserve one bounding box with the highest score. After completing all the above steps, we can get the final detecting results. 3.2 Restricted multi-tracking-based LOI counting For dense traffic scenes, single-target tracking methods are unsuitable; the multi-target tracking methods are often relatively slow and may have relatively low accuracy when dealing with difficult dense scenes. For the task of counting crossing vehicles, we propose a restricted multi-target tracking method based LOI counting. In our method, we only track short-term vehicle trajectories near the counting line, thus improving tracking accuracy and reducing tracking time. In dense traffic scenes, there are usually a relatively large number of vehicles; tracking all vehicles in these scenes is usually a complex and inaccurate process. The restricted multi-tracking method is proposed to consider the vehicles that may cross the counting line. The whole method includes two parts: (i) setting counting line and selecting vehicles to be tracked; (ii) restricted multi-tracking and vehicle counting based on CMOT tracking and trajectory processing. The designed tracking method is as follows. 3.2.1 Setting counting line and selecting vehicles to be tracked Most existing methods use an area near the counting line to track the targets in this area, which often needs a priori knowledge of the road. In the case of dense traffic scenes, the tracking may be still complicated when the number of vehicles in this area is large. In our method, restricted tracking just needs to track a small number of vehicles in each frame; we fulfil this work by fixing the number of tracking vehicles according to the priority value. Before tracking, we set a horizontal counting line in the middle or near middle of the image to cover all the roads that need to be counted. The horizontal counting line is set as c y . The PM-YOLO detects N b vehicles in one frame. The bounding boxes of these vehicles are denoted as b i = x i , y i , w i , h i , i = 1 , 2 , . . . , N b , in which x i and y i are the coordinates of the left-top point and w i and h i are the width and height of the i th bounding box. We use a priority value to extract the vehicles needed to be tracked. Some vehicles with high-priority value are selected to be tracked. The priority of a vehicle in the j th frame is defined as p i j = 1 y i j + h i j / 2 − c y , (1) where y i j and h i j are the y -axis coordinates of the left-top point and the height of the i th bounding box in the j th frame, respectively; c y is the value of the counting line in y -axis. The priority value of all vehicles in the j th frame form the set P j = p 1 j , p 2 j , . . . , p N b j . We arrange the priority value p i j in P j in descending order. Then, we select top N v vehicles with the largest priority values. According to the priority, value p i j is placed in descending order and the coordinates of the N v vehicle bounding boxes are sequentially put into the set B j . That is, B j = b 1 j , b 2 j , . . . , b N v j , where b i j = x i j , y i j , w i j , h i j , i = 1 , 2 , … , N v . In dense traffic flow, there are many vehicles in the road. However, it is unnecessary to track each vehicle in every frame for vehicle counting. Only vehicles that nearly cross the counting line need to be tracked. Therefore, in the set B j , the top N v vehicles with the highest priority value are selected for tracking. A small N v will reduce the number of frames that the vehicle to be tracked and result in a low-tracking accuracy. A large N v can improve the tracking accuracy of the vehicle, but it may increase the complexity with unnecessary tracking. So a suitable N v is important for accurate and fast counting. We choose N v by specific experiments that are explained in Section 4.2. According to the best experiment results, we choose N v = 5. Fig. 2 shows the vehicles selected for tracking. The green line is the counting line. It can be seen that our proposed vehicle selection method based on priority value can track the vehicle near the counting line while reducing the spatial density of vehicles to be tracked. Fig. 2Open in figure viewerPowerPoint Two examples of the tracking results with selected vehicles. The green line is the counting line 3.2.2 CMOT and trajectory processing based restricted multi-tracking and vehicle counting We choose the CMOT algorithm as the tracker in our counting framework. CMOT is a robust online multi-object tracking method based on the tracklet confidence and online discriminative appearance learning. The CMOT tracker is utilised to track N v selected vehicles in every frame. After CMOT tracking, we can get the trajectory set T r j for the j th frame. Then we merge the similar trajectories. As the CMOT uses the initial five frames to initialise trajectory information, we add the trajectories of these five frames based on the tracked trajectory data. This ensures that the trajectories of selected vehicles are completely extracted. Next, we use these final trajectories T j for LOI counting. The pseudo-code of the whole restricted multi-tracking and vehicle counting is shown in Algorithm 1 (see Fig. 3). Fig. 3Open in figure viewerPowerPoint Algorithm 1: Restricted multi-tracking-based vehicle counting In this subsection, the vehicle-counting method in dense traffic is described. In the next subsection, the traffic flow parameters (volume, density and speed) are estimated according to the vehicle detection, tracking and counting results. 3.3 Estimation of different traffic flow parameters The ultimate goal of vehicle counting is to estimate traffic parameters for traffic guidance. The vehicle detection results, tracking trajectories and counting results obtained by the proposed framework can directly estimate the three important traffic parameters of density, speed and volume. In this part, we build the estimation model of the three parameters in the dense traffic scene. Traffic volume is defined as the number of vehicles crossing the counting line per hour. We denote V j as the estimated volume in the j th frame, measured by n / h. In the proposed traffic flow parameters estimation model, F r is denoted as the frame rate of the traffic video measured by frames per second (fps). The F h is the conversion rate from the frame to hour and can be calculated by F h = F r × 3600. The volume V j is estimated according to the LOI-counting results C j directly. The V j is estimated by V j = C j ⋅ F h j , (2) Traffic speed is defined as the average speed of all tracked vehicles. It is calculated by all trajectories involved in every frame and converted from pixels per frame to kilometres per hour. We denote S j as the estimated speed in the j th frame, measured by km / h . The r p measured in pixels is the road pixel length in images, and r a measured in kilometres is the corresponding actual road length in real scenes. We denote R c as the conversion rate from the pixel to kilometres and can be calculated by R c = r a / r p . The speed estimation can be done according to the trajectory position x 1 i ; y 1 i , x 2 i ; y 2 i , . . . in the trajectory set T S j = F h ⋅ R c n ⋅ ∑ i = 1 n E c 1 i , c f i f e i − f s i , (3) where n is the number of trajectories in set T. f is the final frame of the i th trajectory in set T. We denote E c 1 i , c f i as the Euclidean distance between c 1 i and c f i , where c 1 i = x 1 i ; y 1 i and c f i = x f i ; y f i . The f s i and f e i are the frame number of start frame and end frame, respectively, of the i th trajectory. Traffic density is defined as the number of vehicles on a road per kilometre. We denote D j as the estimated density in the j th frame, measured by n / km. To estimate density, we denote the pixel length of the road in the image as r and d j is the number of detected vehicles in the j th frame. The estimation of D j is D j = d j r ⋅ R c , (4) after these processes, the traffic flow parameters of volume, speed and density can be estimated. 4 Experiments 4.1 Data sets We use the public urban traffic data set UA-DETRAC benchmark [13] to analyse the performance of the proposed framework. In UA-DETRAC, there are 10 h of videos captured with a Cannon EOS 550D camera at 24 different locations in China. These videos are with 25 fps and 960 × 540 resolution. Our trained sets come from all the UA-DETRAC benchmark training data and are used to train the PM-YOLO vehicle detector. Our test sets are selected randomly from the UA-DETRAC benchmark testing data. Later five dense traffic videos including MVI_39401, MVI_40772, MVI_40775, MVI_40793 and MVI_40852 are selected to evaluate the performance of vehicle-counting method. In order to evaluate the traffic parameters estimation model, two traffic videos (video-A, video-B) under dense traffic scenes are captured and tested. The two videos are also with 25 fps and 960 × 540 pixel resolution. The PM-YOLO is trained based on pre-trained model on ImageNet [26], which is programmed by Python. The restricted multi-tracking based CMOT and LOI counting are programmed by Matlab. The platform is with a Quad-Core i7-7700 CPU and a single Titan-X GPU. 4.2 Evaluation of the detection and restricted multi-tracking-based LOI-counting method These experiments in this part are designed to evaluate the proposed PM-YOLO and restricted multi-tracking-based LOI-counting method. In Table 1, we compare the performance between PM-YOLO and YOLO on four test videos. The performance is measured by precision (Pre.) and recall (Rec.). Table 1 shows that the proposed PM-YOLO has a 0.85% better precision and a 1.21% better recall on an average than that of YOLO. Hence, the PM-YOLO has better performance than YOLO. Table 1. Performance comparison of PM-YOLO and YOLO Data Pre.(YOLO), % Rec.(YOLO), % Pre.(PM-YOLO), % Rec.(PM-YOLO), % MVI-39401 98.12 96.31 99.21 97.33 MVI-40852 98.87 96.21 99.53 97.82 video-A 98.88 95.51 99.43 96.63 video-B 98.53 94.53 99.63 95.63 average 98.60 95.64 99.45 96.85 The number of vehicles selected with several highest priority is the key to improve the tracking accuracy. We design an experiment to select a suitable N v in Algorithm 1, and then validate the performance of the whole counting method. We select a different number of vehicles near the counting line to be tracked under different N v . In addition the final counting results after the tracking are shown in Table 2. As shown in Table 2 the proposed LOI-counting method can achieve different performance with different N v ; the highest counting accuracy is 96.10 % with the N v = 5. The value of N v determines the number of vehicles to be tracked. A small N v will reduce the number of frames for the vehicle to be tracked and result in a low-tracking accuracy. A large N v can improve the tracking accuracy of the vehicle, but it may increase the complexity with unnecessary tracking. So a suitable N v is important for accurate and fast counting. We choose N v according to the experimental results. In addition, we choose N v = 5 finally with the highest accuracy of 96.10 %, and achieve the consuming time of 9.21 fps. Table 2. Performance of LOI counting with different parameters N v N v Accuracy, % Speed, fps N v = 2 93.69 10.76 N v = 3 94.89 9.89 N v = 4 95.50 9.54 N v = 5 96.10 9.21 N v = 6 95.80 8.89 4.3 Performance and comparison of the entire framework The entire framework mainly consists of three parts: (i) PM-YOLO-based vehicle detection; (ii) LOI counting based on restricted multi-target tracking; (iii) traffic flow parameter estimation model. We design experiments to evaluate and compare the performance of the counting, and to evaluate the parameter estimation model of three main traffic flow parameters. We test all the seven videos in our test set by the proposed counting framework and list the LOI-counting results in Table 3. Table 3. Performance of the proposed counting method Data Ground truth Count Difference Accuracy, % GEH MVI-39401 114 110 4 96.49 3.04 MVI-40772 47 45 2 95.74 2.55 MVI-40775 37 35 2 94.59 3.20 MVI-40793 78 75 3 96.15 2.32 MVI-40852 57 55 2 96.49 2.36 video-A 196 188 8 95.92 3.50 video-B 242 233 9 96.28 3.75 ALL 771 741 30 96.10 2.97 In Table 3, ‘ground truth’ represents the true number of crossing vehicles; ‘count’ represents the counting results; ‘difference’ represents the difference between the ground truth and the counting results and ‘accuracy’ represents the accuracy of the counting results and is calculated by Accuracy = 1 − Difference Groundtruth , (5) We also use GEH (the name GEH comes from Geoffrey E. Havers, who invented it) to evaluate the counting results. The ‘GEH’ is calculated according to the following equation: GEH = 2 × Differenc e 2 Count + Groundtruth , (6) From the results in Table 3, it can be seen that our counting method can achieve the counting accuracy ranging from 94.59 to 96.49%, and achieve an average counting accuracy of 96.10%. In addition, our counting method obtains a GEH value of 2.97 for all seven test videos, the largest is 3.75 and the smallest is 2.32. Performed with our counting method, the seven dense videos with different complexity have similar performance. The parameter R c in the parameter estimation model we proposed is available for the two captured videos (video-A and video-B). The conversion rate R c from the pixel to kilometres is a needed parameter in (3) and (4). Then the performance of our proposed traffic flow parameters estimation model on video-A and video-B is shown in Fig. 4 and Table 4. Fig. 4Open in figure viewerPowerPoint Estimation of three traffic flow parameters including volume, speed and density on video-A and video-B Table 4. Traffic flow parameters estimation results tested on our lab videos Performance Video-A Video-B GT (average volume) 7335 7653 ES (average volume) 7011 7272 MAPE (volume) 4.42 % 4.98 % GT (average speed) 42.19 41.77 ES (average speed) 40.49 43.36 MAPE (speed) 4.03 % 3.81 % GT (average density) 178 183 ES (average density) 172 175 MAPE (density) 3.37 % 4.37 % The curves in Fig. 4 show the estimated traffic flow parameters of volume, speed and density on two videos. In each curve, the estimated parameters on 2173 frames are plotted. The estimation accuracy of the volume is mainly determined by the accuracy of counting because the volume is calculated directly by the LOI-counting results. The curves show that the estimation results of the volume are accurate and relatively stable. The speed is estimated by tracking results; it is stable and most estimated values are between 30 and 50 km/h. The density is defined as the estimated vehicle number on a 1-km road. As the road length in the image is much smaller compared with a real 1-km road, a small deviation in the number of detected vehicles can be magnified when estimating the density. This makes the density estimation vary significantly. Table 4 shows the estimated traffic flow parameters (average volume, average speed and average density) results with our two lab videos. We use MAPE to evaluate the estimation results. The MAPE is calculated according to MAPE = GT − ES GT , (7) where GT and ES are the ground-truth value and estimation value, respectively, of average volume, average speed or average density. The MAPE of three parameters estimation in Table 4 are between 3.37 and 4.98%. The highest 4.98% is the MAPE of volume estimation on video-B and the lowest 3.37% is the MAPE of density estimation on video-A. These results show that our counting model can estimate traffic flow parameters with low MAPE. The LOI-counting results can accurately estimate the volume directly. The speed is estimated according to the information of trajectories extracted by our restricted multi-tracking method. The proposed PM-YOLO vehicle detection result can also have a good estimation of the density. In order to validate the performance of the proposed LOI-counting framework, we compare our method with some representative counting methods. The methods used for comparison are proposed by Bouvie et al. [27], Quesada and Rodriguez [28], Yang and Qu [29] and Abdelwahab [30], respectively. Three videos with reported results are chosen to compare these methods. The three videos include two videos (M-30 and M-30-HD) from the GRAM data set and one video (highway) from CDnet2014 data set. Table 5 shows the comparison results. We list accuracy for each method on these three videos in Table 5. Table 5. Comparison results with other representative counting methods on three public videos Method M-30, % M-30-HD, % Highway, % Bouvie et al. [27] 89.62 78.57 — Quesada and Rodriguez [28] 97.41 92.86 — Yang and Qu [29] 92.20 88.10 93.30 Abdelwahab [30] 98.70 100 92.31 proposed method 100 100 96.30 The results in Table 5 show that our method can achieve the highest accuracy values that are at least 1.30 and 3.99% better than those of the other methods on M-30 and highway, respectively. The accuracy of our method on video M-30 and M-30-HD can achieve 100%. There are some reasons that contribute to the high performance of our method. First, the proposed PM-YOLO network can effectively detect vehicles with occlusions and small-size, which improve the detection accuracy. Second, the restricted multi-target tracking just tracks the vehicles that play an important role in counting, avoiding tracking redundant vehicles. Lastly, the LOI-counting method based on restricted multi-target tracking can do an analysis of the tracking trajectories to get accurate counting results. Hence, the proposed LOI-counting framework can achieve better accuracy than the representative methods in comparison. 5 Conclusion and future work In this paper, we propose a framework to count vehicles and estimate three main traffic flow parameters, which contains a PM-YOLO network-based vehicle detector, a restricted multi-tracking based LOI-counting method and parameter estimation model. For addressing the vehicle detection problem in dense traffic scenes, we propose a PM-YOLO network to effectively detect vehicles with occlusions and small-size. To avoid redundant tracking and improve tracking accuracy, we propose a restricted multi-target tracking method which can track vehicles that are important for counting. After multi-target tracking, we do analysis of the tracking trajectories to do vehicle counting. Based on the results of detection, tracking and counting, we design traffic parameter estimation model for estimate three main parameters including volume, speed and density. The evaluation experiments on the public and lab databases show that the proposed framework can efficiently count vehicles and estimate three traffic flow parameters with high accuracy, and outperforms the representative estimation methods in comparison. In future, we aim to improve our vehicle-counting method to adapt to more complex traffic scenes. 6 Acknowledgments This work was supported by the National Key R&D Program of China (Grant no. 2018YFB1305300), the National Nature Science Foundation of China (Grant nos. 61673244 and 61703240) and the Key R&D Program of Shandong province of China (Grant nos. 2019JZZY010130 and 2018CXGC0907). 7 References 1Zhang N. Wang F.Y. Zhu F. et al.: ‘DynaCAS: computational experiments and decision support for ITS’, IEEE Intell. Syst., 2008, 23, (6), pp. 19– 23CrossrefCASWeb of Science®Google Scholar 2Wang R. Zhang L. Xiao K. et al.: ‘Easisee: real-time vehicle classification and counting via low-cost collaborative sensing’, IEEE Trans. Intell. Transp. Syst., 2014, 15, (1), pp. 414– 424CrossrefWeb of Science®Google Scholar 3Sifuentes E. Casa O. Pallas-Areny R.: ‘Wireless magnetic sensor node for vehicle detection with optical wake-up’, IEEE Sens. J., 2011, 11, (8), pp. 1669– 1676CrossrefWeb of Science®Google Scholar 4Liu C. Chang F.: ‘Hybrid cascade structure for license plate detection in large visual surveillance scenes’, IEEE Trans. Intell. Transp. Syst., 2018, 20, (6), pp. 2122– 2135CrossrefWeb of Science®Google Scholar 5Gao Z. Zhai R. Wang P. et al.: ‘Synergizing appearance and motion with low rank representation for vehicle counting and traffic flow analysis’, IEEE Trans. Intell. Transp. Syst., 2018, 19, (8), pp. 2675– 2685CrossrefWeb of Science®Google Scholar 6Ke R. Li Z. Kim S. et al.: ‘Real-time bidirectional traffic flow parameter estimation from aerial videos’, IEEE Trans. Intell. Transp. Syst., 2017, 18, (4), pp. 890– 901CrossrefWeb of Science®Google Scholar 7Dai Z. Song H.S. Wang X. et al.: ‘Video-based vehicle counting framework’, IEEE Access, 2019, 7, pp. 64460– 64470CrossrefWeb of Science®Google Scholar 8Sun M. Wang Y. Li T. et al.: ‘Vehicle counting in crowded scenes with multi-channel and multi-task convolutional neural networks’, J. Visual Commun. Image Represent., 2017, 49, pp. 412– 419CrossrefWeb of Science®Google Scholar 9Ke R. Li Z. Tang J. et al.: ‘Real-time traffic flow parameter estimation from UAV video based on ensemble classifier and optical flow’, IEEE Trans. Intell. Transp. Syst., 2019, 20, (1), pp. 54– 64CrossrefWeb of Science®Google Scholar 10Zhao R. Wang X.: ‘Counting vehicles from semantic regions’, IEEE Trans. Intell. Transp. Syst., 2013, 14, (2), pp. 1016– 1022CrossrefWeb of Science®Google Scholar 11Redmon J. Farhadi A.: ‘ YOLOv3: an incremental improvement’, arXiv preprint: 1804.02767, 2018, pp. 1– 6Google Scholar 12Bae S.H. Yoon K.: ‘ Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning’, IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1218– 1225CrossrefGoogle Scholar 13Wen L. Du D. Cai Z. et al.: ‘ UA-DETRAC: A new benchmark and protocol for multi-object tracking’, CoRR, abs/1511.04136, 2015Google Scholar 14Ren S. Girshick R. Girshick R. et al.: ‘Faster R-CNN: towards real-time object detection with region proposal networks’, IEEE Trans. Patt. Anal. Mach. Intell., 2017, 39, (6), pp. 1137– 1149CrossrefPubMedWeb of Science®Google Scholar 15Liu W. Anguelov D. Erhan D. et al.: ‘ SSD: single shot MultiBox detector’. European Conf. on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21– 37CrossrefGoogle Scholar 16Zhang Y. Zhao C. He J. et al.: ‘Vehicles detection in complex urban traffic scenes using Gaussian mixture model with confidence measurement’, IET Intell. Transp. Syst., 2016, 10, (6), pp. 445– 452Wiley Online LibraryWeb of Science®Google Scholar 17Chauhan N.S. Rahman F. Sarker R. et al.: ‘ Vehicle detection, tracking and counting using linear quadratic estimation technique’. Int. Conf. on Inventive Systems and Control, Valencia, Spain, 2018, pp. 603– 607CrossrefGoogle Scholar 18Maqbool S. Khan M. Tahir J. et al.: ‘ Vehicle detection, tracking and counting’. IEEE Int. Conf. Signal and Image Processing, Athens, Greece, 2018, pp. 126– 132CrossrefGoogle Scholar 19Liang M.P. Huang X. Chen C.H. et al.: ‘Counting and classification of highway vehicles by regression analysis’, IEEE Trans. Intell. Transp. Syst., 2015, 16, (5), pp. 2878– 2888CrossrefWeb of Science®Google Scholar 20Wang Z.L. Liu X. Feng J.S. et al.: ‘Compressed-domain highway vehicle counting by spatial and temporal regression’, IEEE Trans. Circuits Syst. Video Tech., 2019, 29, (1), pp. 263– 274CrossrefWeb of Science®Google Scholar 21Liu X. Wang Z.L. Feng J.S. et al.: ‘ Highway vehicle counting in compressed domain’, IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 3016– 3024CrossrefGoogle Scholar 22Seenouvong N. Watchareeruetai U. Nuthong C. et al.: ‘ A computer vision based vehicle detection and counting system’. Int. Conf. Knowledge and Smart Technology, Chiangmai, Thailand, 2016, pp. 224– 227CrossrefGoogle Scholar 23Zhang Y. Zhao C. Zhang Q.: ‘Counting vehicles in urban traffic scenes using foreground time-spatial images’, IET Intell. Trans. Syst., 2017, 11, (2), pp. 61– 67Wiley Online LibraryWeb of Science®Google Scholar 24Chen C.Y. Liang Y.M. Chen S.W.: ‘ Vehicle classification and counting system’. Int. Conf. on Audio, Language and Image, Shanghai, China, 2015, pp. 485– 490Google Scholar 25Zhang Z. Liu K. Gao F. et al.: ‘ Vision-based vehicle detecting and counting for traffic flow analysis’. Int. Joint Conf. on Neural Networks, Vancouver, Canada, 2016, pp. 2267– 2273CrossrefGoogle Scholar 26Deng J. Dong W. Socher R. et al.: ‘ Imagenet: A large-scale hierarchical image database’, IEEE Conf. on Computer Vision and Pattern Recognition, Florida, USA, 2009, pp. 248– 255CrossrefGoogle Scholar 27Bouvie C. Scharcanski J. Barcellos P. et al.: ‘ Tracking and counting vehicles in traffic video sequences using particle filtering’. IEEE Int. Instrumentation and Measurement Technology Conf., Minneapolis, USA, 2013, pp. 1– 4CrossrefGoogle Scholar 28Quesada J. Rodriguez P.: ‘ Automatic vehicle counting method based on principal component pursuit background modeling’. Int. Conf. on Image Processing, Phoenix, USA, 2016, pp. 3822– 3826CrossrefGoogle Scholar 29Yang H. Qu S.: ‘Real-time vehicle detection and counting in complex traffic scenes using background subtraction model with low-rank decomposition’, IET Intell. Transp. Syst., 2017, 12, (1), pp. 75– 85Wiley Online LibraryWeb of Science®Google Scholar 30Abdelwahab M.A.: ‘Fast approach for efficient vehicle counting’, Electron. Lett., 2019, 55, (1), pp. 20– 22Wiley Online LibraryWeb of Science®Google Scholar Citing Literature Volume14, Issue12December 2020Pages 1517-1523 FiguresReferencesRelatedInformation
Referência(s)