Towards long‐term UAV object tracking via effective feature matching
2020; Institution of Engineering and Technology; Volume: 56; Issue: 20 Linguagem: Inglês
10.1049/el.2020.1096
ISSN1350-911X
AutoresBaojun Zhao, Hongshuo Wang, Linbo Tang, Yuqi Han,
Tópico(s)Infrared Target Detection Methodologies
ResumoElectronics LettersVolume 56, Issue 20 p. 1056-1059 Image and vision processing and display technologyFree Access Towards long-term UAV object tracking via effective feature matching Baojun Zhao, Baojun Zhao School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this authorHongshuo Wang, Hongshuo Wang School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this authorLinbo Tang, Corresponding Author Linbo Tang tanglinbo@bit.edu.cn School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this authorYuqi Han, Yuqi Han orcid.org/0000-0001-7905-0163 School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this author Baojun Zhao, Baojun Zhao School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this authorHongshuo Wang, Hongshuo Wang School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this authorLinbo Tang, Corresponding Author Linbo Tang tanglinbo@bit.edu.cn School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this authorYuqi Han, Yuqi Han orcid.org/0000-0001-7905-0163 School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081 People's Republic of ChinaSearch for more papers by this author First published: 01 September 2020 https://doi.org/10.1049/el.2020.1096Citations: 2AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Object tracking based on unmanned aerial vehicles (UAVs) has attracted extensive research attention recently since it provides the ability to continuously observing and tracking the target owing to its inherent advantage. However, occlusion is a crucial interference which may cause performance degradation in long-term UAV-based tracking. In this Letter, the authors propose a robust and efficient long-term tracker based upon local feature matching and density clustering. To be more specific, the authors propose a keypoint-matching based confidence indicator to monitor the tracking condition and activate the re-detection module when occlusion is predicted. Once occlusion occurs, a novel density-based clustering method is utilised to re-locate the target with the collected local features. Extensive experiments have demonstrated that the proposed algorithm performs favourably against the other related trackers. Introduction Visual object tracking on unmanned aerial vehicle (UAV) platform has extensive applications in military defence and civil security. The most crucial challenge is to overcome the appearance contamination and potential drift caused by occlusion interference. Among the existing long-term tracking algorithms, tracking–learning–detection (TLD) [1 ] trains a random-fern classifier to correct the tracking result every frame at a heavy computational burden. The long-term correlation tracker [2 ] starts the re-detection module with incremental support vector machine (SVM) when occlusion is predicted by correlation response in tracking module. However, it is hard to judge the model drift accurately with handcrafted response threshold. To address the above issues, in this Letter, we propose a robust long-term tracker based upon local feature matching and density-based redetecting. Specifically, we collect the local target features and evaluate the tracking condition with the collected oriented fast and rotated brief (ORB) feature points. Once occlusion or model drift is predicted, the feature matching is activated to vote on the target centre as well as estimate whether the target re-appears. Afterwards, we employ a density-based clustering method to prune the false votes as well as determine the target centre. Experimental results indicate that the proposed method outperforms the most compared methods. Proposed tracker As mentioned above, the overall tracking framework could be decomposed into three parts: (i) local feature collecting, (ii) tracking condition judging and (iii) occluded target redetecting. The pipeline of our tracking algorithm is illustrated in Fig. 1. Fig. 1Open in figure viewerPowerPoint Overall framework of the preoposed tracker. The tracking module and re-detection module are shown in blue and red, respectively (i) Local feature collecting : Common local feature description methods mainly contain SIFT, SURF, ORB [3-5 ]. The complexity of SURF and SIFT are high, which may hinder the practical use of UAV tracking application. Thus, we incorporate ORB method to construct the local feature database by taking its advantage of the efficient binary operation. We illustrate the key-point extracting procedure in the ORB module as shown in Fig. 1. The red circles in each tracking result denote the collected ORB features . Here, and represent the position and the corresponding binary feature vector, respectively. r and indicate the displacement and clockwise relative deflection angle between the target centre and the extracted feature point . It can be seen from Fig. 1 that local feature points remain stable when the tracking condition is ideal. Therefore, we construct a local feature library by collecting target's ORB features during tracking. Specifically, we extract the local feature at the estimated tracking position for each frame. Afterwards, we match the obtained with the features in the library K. The Hamming distance between and K would be calculated to judge whether the matching succeeds. The successful matching would be update while the mismatching feature would be initialised as the new target local feature. Furthermore, a linear confidence updating scheme is proposed to represent the confidence for the collected local features as below (1) Here, means an initialised local feature confidence and denotes the increase of the confidence for successful feature matching. Similarly, a corresponding exit mechanism should be built for those unpaired points to avoid the capacity expansion of the feature library. Their confidence should be subtracted, , as the cost of no matching. is the pre-defined threshold to separate the successful matching from the failed. We select the features which appear steadily as the member in the local feature library K and delete the transient features whose confidence is less than the threshold to maintain a reasonable capacity for the library. (ii) Tracking condition judging : Since tracking algorithm often suffers from model drift when occlusion occurs, it is necessary to monitor the tracking status and determine whether to start re-detection module. Some existing works employ response score or its variant [6, 7 ] as the tracking condition indicator. We argue that the interpretability between occlusion and response value is weak, since the occluder may also provide high response score. In this subsection, we advocate a keypoint-based indicator to reveal the tracking condition since the matching pairs change smoothly when appearance changes slowly, while once the target is temporally occluded, the target's feature would be replaced by the background which leads to the decrease of the matching pairs. Based upon such a phenomenon, we count the number of the matching pairs after obtaining the target centre with the baseline tracker to forecast the potential occlusion in advance (2) Afterwards, we would calculate the ratio between the current matching number and the average matching number in the historical frames . Once a drastic decrease (less than a fixed threshold of ) is detected, we consider that the target is occluded and the re-detection module is activated immediately to search for the target. (iii) Occluded target redetecting : Occlusion is a crucial interference due to the template contamination which may lead to model drift in long-term tracking. Some existing long-term trackers employ incremental SVM and random fern at a heavy computational burden, which hinders their application on the resource-constrained UAV platform. In this way, we propose an effective and efficient re-detection method based on local feature matching and density-based clustering [8 ]. We simplify the large-scale sliding window detection into local ORB feature matching with low computational complexity and flexible detection area. In order to increase the overall stability, we also utilise the monitoring criteria upon the obtained result. Only if the cluster is detected in consecutive N frames with small displacement, it would be assigned as the redetected target ultimately. To be specific, we extract the candidate features in the searching region and match them with the binary feature vectors in the feature library. Afterwards, with each successful matching, we obtain the corresponding displacement vector and angle vector to vote for the possible target centre. A density-based clustering approach is advocated to prune the mismatches and outliers. The details for our density clustering approach are shown in Algorithm 1 for better understanding. Algorithm 1.Algorithm flow of density-based clustering Input: The unassigned voting centres . Output: Voting centres with cluster . 1: Arbitrarily select an unassigned in S. 2: Regard the sample set in the neighbourhood of image space as and the number as . 3: if then 4: A new cluster is created with the core point . 5: for in do 6: Mark as cluster . 7: if is the outlier then 8: Mark as the boundary point. 9: Repeat 2, 3, 5, 6, 7, 8. 10: else 11: Mark as an outlier cluster. 12: Repeat the above steps until all the voting centres meet in the cluster or are outliers. Generally, the cluster with 20 candidate samples would be selected as the redetected target centre. If multiple clusters are obtained, we choose the cluster with the highest confidence Con by considering both the number of matching pairs and the correlation with the target template. Here, is the number of samples in the cluster . is the maximum correlation reponse value in each cluster (3) Finally, the average of the horizontal and vertical coordinates in the selected cluster would be assigned as the redetected target centre if it appears for N consecutive frames as described above. Experimental results In this section, we validate our proposed tracker on 32 selected videos with the occlusion attribute in the UAV-20L [9 ] and OTB100 [10 ] data sets. Furthermore, several milestone trackers reported in the recent journals and conferences namely SAMF-CA [11 ], STAPLE-CA [11 ], MUSTER [12 ], CSR-DCF [13 ], BACF [14 ], STCA [15 ], LADCF [16 ], TLD [1 ], LCT [2 ] and ROT [17 ] are compared with our tracking algorithm to verify its tracking behaviour. In our experiment, we employ the LADCF [16 ] tracker as our baseline. We report some crucial parameters in our tracker in this subsection. The threshold and is set as , separately. The feature confidence addition parameter in the local feature library is 2, the attenuation parameter is 1 and the initialised parameter is also set to 1 in (1). We calculate the average of 100 previous frames to determine and get the redetected result with at least frames. We illustrate the qualitative comparison between our tracker and other cutting-edge trackers in Figs. 2 and 3. One can see that all the other trackers lose the target in Person7 and Person14 in Fig. 2, while only our tracker successfully locates the target precisely throughout the entire video. In other sequences, short-term trackers such as BACF and CSR-DCF can barely handle the occlusion issue in Fig. 3. While our method manages to address such attribute, demonstrating favourable tracking and redetection performance. Fig. 2Open in figure viewerPowerPoint Qualitative performance evaluation of our tracker and other state-of-the-art trackers on Person7 and Person14, where targets undergo occlusion attribute in these UAV-20L sequences Fig. 3Open in figure viewerPowerPoint Qualitative performance evaluation of our tracker and other state-of-the-art trackers on Bird2, Box and KiteSurf, where targets undergo occlusion attribute in these OTB100 sequences Lastly, we report the quantitative result to further demonstrate the effectiveness of our method. All trackers are evaluated by adopting two common criteria in tracking community [10 ]: the precision and success rate for quantitation evaluation. What is interesting, even though TLD, LCT and ROT are equipped with the re-detection module, they perform inferior in partial long-term sequences. As illustrated in Fig. 4, even some short-term trackers (BACF and CSR-DCF) outperform them with distinct margin. While STCA and LADCF achieve favourable performance due to the employment of temporal constrain in their algorithm designing. Our tracker achieves the best performance with 3.4 and 2.0 Fig. 4Open in figure viewerPowerPoint Overall performance of precison and success rate with the scores for each trackers reporting in the legend Conclusion In this Letter, we propose an effective re-detection module based on local feature matching to tackle the occlusion issue as well as achieve the long-term UAV object tracking. Specifically, we construct a local feature descriptor for the target to judge the tracking condition. Once occlusion occurs, a density-based clustering method is utilised to re-locate the target with the collected local features. Extensive experiments on public data sets have demonstrated that our algorithm performs favourably against the other related trackers. Acknowledgments This work was supported by 111 Project of China (grant no. B14010) and Chang Jiang Scholars Programme (grant no. T2012122). H. Wang is grateful for the instructions given by Z. Zhang. References 1Kalal Z. Mikolajczyk K. Matas J. et al.: 'Tracking-learning-detection ', IEEE Trans. Pattern Anal. Mach. Intell., 2012, 34, (7 ), pp. 1409– 1422 (doi: 10.1109/TPAMI.2011.239 ) 2Ma C. Yang X. Zhang C. et al.: 'Long-term correlation tracking '. Computer Vision and Pattern Recognition, Boston, MA, USA, June 2015, pp. 5388– 5396 3Lowe D G.: 'Distinctive image features from scale-invariant keypoints ', Int. J. Comput. Vision, 2004, 60, (2 ), pp. 91– 110 (doi: 10.1023/B:VISI.0000029664.99615.94 ) 4Bay H. Tuytelaars T. Van Gool L. et al.: 'SURF: speeded up robust features '. European Conf. on Computer Vision, Graz, Austria, May 2006, pp. 404– 417 5Rublee E. Rabaud V. Konolige K. et al.: 'ORB: an efficient alternative to SIFT or SURF '. Int. Conf. Computer Vision, Barcelona, Spain, November 2011, pp. 2564– 2571 6Han Y. Deng C. Zhao B. et al.: 'State-aware anti-drift object tracking ', IEEE Trans. Image Process., 2019, 28, (8 ), pp. 4075– 4086 (doi: 10.1109/TIP.2019.2905984 ) 7Zhao Z. Han Y. Xu T. et al.: 'A reliable and real-time tracking method with color distribution ', Sensors, 2017, 17, (10 ), pp. 2303– 2319 (doi: 10.3390/s17102303 ) 8Ester M. Kriegel H.P. Sander J. et al.: 'A density-based algorithm for discovering clusters in large spatial databases with noise '. Knowledge Discovery and Data Mining, Portland, OR, USA, 1996, pp. 226– 231 9Mueller M. Smith N. Ghanem B.: 'A benchmark and simulator for UAV tracking '. European Conf. on Computer Vision, Amsterdam, Netherlands, October 2016, pp. 445– 461 10Wu Y. Lim J. Yang M.: 'Object tracking benchmark ', IEEE Trans. Pattern Anal. Mach. Intell., 2015, 37, (9 ), pp. 1834– 1848 (doi: 10.1109/TPAMI.2014.2388226 ) 11Mueller M. Smith N. Ghanem B. et al.: 'Context-aware correlation filter tracking '. Computer Vision and Pattern Recognition, Hawaii, USA, July 2017, pp. 1396– 1404 12Hong Z. Chen Z. Wang C. et al.: 'Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking '. Computer Vision and Pattern Recognition, Boston, MA, USA, June 2015, pp. 749– 758 13Lukezic A. Vojir T. Čehovin Zajc L. et al.: 'Discriminative correlation filter with channel and spatial reliability '. Computer Vision and Pattern Recognition, Hawaii, USA, July 2017, pp. 4847– 4856 14Galoogahi H.K. Fagg A. Lucey S.: 'Learning background-aware correlation filters for visual tracking '. Computer Vision and Pattern Recognition, Hawaii, USA, July 2017, pp. 21– 26 15Han Y. Deng C. Zhao B. et al.: 'Spatial-temporal context-aware tracking ', IEEE Signal Process. Lett,, 2019, 26, (3 ), pp. 500– 504 (doi: 10.1109/LSP.2019.2895962 ) 16Xu T. Feng Z.H. Wu X.J. et al.: 'Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual tracking ', IEEE Trans. Image Process., 2019, 28, pp. 5596– 5609 (doi: 10.1109/TIP.2019.2919201 ) 17Dong X. Shen J. Yu D. et al.: 'Occlusion aware real-time object tracking ', IEEE Trans. Multimedia, 2017, 19, (4 ), pp. 763– 771 (doi: 10.1109/TMM.2016.2631884 ) Citing Literature Volume56, Issue20September 2020Pages 1056-1059 FiguresReferencesRelatedInformation
Referência(s)