Artigo Revisado por pares

Motion and illumination defiant cut detection based on Weber features

2018; Institution of Engineering and Technology; Volume: 12; Issue: 10 Linguagem: Inglês

10.1049/iet-ipr.2017.1237

ISSN

1751-9667

Autores

T. Kar, P. Kanungo,

Tópico(s)

Advanced Image and Video Retrieval Techniques

Resumo

IET Image ProcessingVolume 12, Issue 10 p. 1903-1912 Research ArticleFree Access Motion and illumination defiant cut detection based on Weber features Tejaswini Kar, Corresponding Author Tejaswini Kar tkarfet@kiit.ac.in School of Electronics Engineering, KIIT Deemed to be University, Bhubaneswar, IndiaSearch for more papers by this authorPriyadarshi Kanungo, Priyadarshi Kanungo Department of Electronics and Telecommunication Engineering, C. V. Raman College of Engineering, Bhubaneswar, IndiaSearch for more papers by this author Tejaswini Kar, Corresponding Author Tejaswini Kar tkarfet@kiit.ac.in School of Electronics Engineering, KIIT Deemed to be University, Bhubaneswar, IndiaSearch for more papers by this authorPriyadarshi Kanungo, Priyadarshi Kanungo Department of Electronics and Telecommunication Engineering, C. V. Raman College of Engineering, Bhubaneswar, IndiaSearch for more papers by this author First published: 01 October 2018 https://doi.org/10.1049/iet-ipr.2017.1237Citations: 5AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The spontaneous proliferation of video data necessitates implementing hierarchical structures for various content management applications. Temporal video segmentation is the key towards such management. To address the problem of temporal segmentation, the current communication exploits the concept of psychological behaviour of the human visual system. Towards this goal an abrupt cut detection scheme has been proposed based on Weber's law which provides a strong spatial correlation among the neighbouring pixels. Thus, the authors provide a robust and unique solution for abrupt shot boundary detection when the frames are affected partially or fully by flashlight, fire and flicker, high motion associated with an object or camera. Further, they have devised a model for generating an automatic threshold, taking into account the statistics of the feature vector which quadrates itself with the variation in the contents of the video. The effectiveness of the proposed framework is validated by exhaustive comparison with few contemporary and recent approaches by using benchmark datasets TRECVID 2001, TRECVID 2002, TRECVID 2007 and some publicly available videos. The results obtained give credence to the remarkable improvement in the performance while preserving a good trade-off between missed hits and false hits as compared to the state-of-the-art methods. 1 Introduction The growth of the video data has increased exponentially in the last few decades. However, effective use of these video resources is adversely influenced by lack of viable systems that can facilitate systematic organisation, video database navigation and retrieval of information of interest [1-3]. One way to achieve this is by manual indexing through sequential scan, which is extremely cumbersome and highly unreliable. Video data are highly unstructured [4] in nature carrying diverse and complex information. This lays down a major challenge to the current video retrieval community and becoming a research hotspot in the domain of human–computer communication. So for the automation of video indexing, retrieval and content analysis the partitioning of the video data into basic structural units is a must. This basic unit, also known as shot [5], is a sequence of interrelated frames captured through continuous camera operation conveying a unit semantic information. The two consecutive shots are separated by either a single frame known as a cut or several frames that change its attributes slowly known as gradual transition (GT). Cuts also known as hard or abrupt cuts are instinctively formed during the creation of any video (non-edited video). However, when videos are re-tailored to include special effects to make it visually more appealing it results in gradual transition. Roughly speaking, abrupt cuts are more prevailing than gradual transitions in a video which accounts for >99% of all transitions [6]. Again, despite many cut detection algorithms proposed in the literature [7-10], limited efforts have been paid on detecting cuts in the presence of dramatic light variation such as flashlight affecting one frame to several frames, fire effects such as fire affecting a small area to the full frames, flicker with gradually varying intensity and fast/shaky camera/object motion [11-13]. The primary challenge in any cut detection process is a selection of a feature as well as a threshold which could be able to discriminate between variations due to the above-mentioned effects and variation due to the actual scene change similar to the human visual system. Weber's law-based features are inspired by the psychological behaviour of the human visual system. These features are robust to rotation and scaling as well as efficient in capturing both, the motion and textural information. Apart from all these characteristics, it is simpler and faster in comparison to most of the other textural features. The above characteristics of Weber's law-based feature motivated us to use it for the first time to detect abrupt cuts in videos to overcome the mentioned challenges in a unified approach. The remaining paper is structured as follows. Section 2 describes a review of the literature in the area of shot boundary detection and the motivation behind the work. Section 3 illustrates two basic Weber's features, i.e. quantised differential excitation (QDE) and quantised gradient orientation (QGO). Section 4 focuses on the proposed video content similarity parameter. The formulation of the adaptive threshold is presented in Section 5. Analysis of the experimental results with the validation of the strengths of the proposed method is carried out in Section 6 followed by the concluding remarks in Section 7. 2 Related work Till date substantial amount of research has been conducted in the area of the video cut detection. Towards this end, a good number of diversified approaches have been introduced in the literature [14-16]. In this section, we will discuss about few hard cut detection approaches. The simplest way to sense a discontinuity in visual information is to measure the pixel intensity difference between consecutive frames [17]. In the classification stage, a cut is said to be present if it exceeds a threshold. Both the stages are highly sensitive towards noise and illumination variation as well as camera/object motion [11]. In the literature, many algorithms have been proposed based on mutual information feature. Cernekova et al. [18] validated that mutual information and joint histogram capture more salient frame-to-frame information and thus are good choices for the features to make the algorithm independent of motion and light variation. Edge-based and gradient-based features [19] are also known to be potential candidates that have been deployed in the literature for analysing the visual content of the frames. Zabih et al. [20] ventilated an edge feature-based algorithm known as edge change ratio method for cut detection. Edge-based features are found to be comparatively less sensitive to illumination variation but fails for camera operations such as zooming and scaling. The method also fails for sudden introduction or withdrawal of an object from the scene. One of the simplest yet popular features to measure the temporal discontinuity is the histogram feature [17]. Although the feature lacks spatial information, it is relatively less susceptible to the camera and/or object motion and illumination variation. In histogram-based methods, absolute sum of bin wise histogram difference (ASHD) between consecutive frames is first evaluated. A cut is identified if it surpasses a threshold. A variant of histogram-based feature, such as a block-based histogram [9] difference, is found to produce superior performance than a normal histogram difference. However, histogram-based methods may produce false cuts owing to the fact that two different frames may have a similar histogram. Moreover, histogram-based features are very sensitive to sudden light variation and high object and/or camera motion between two consecutive frames. Kucuktunc et al. [21] proposed a novel fuzzy colour histogram (FCH)-based approach for video copy detection. The authors have identified that FCH is more accurate and robust against illumination changes. Inspired by this FCH proposed by Kucuktunc et al. [21], Dadashi and Kanan [22] proposed FCH-based automatic cut detection. This method has focused on the feature frame difference instead of paying attention to the content of the frames. The authors detected the cut transitions according to the temporal dependency of the frames modelled as a set of fuzzy rules. This method was found to be immune to intensity variation and performs well for videos involving high object and camera motion. However, since it uses 26 fuzzy rules for histogram formulation for each non-overlapping block of the frame and 15 fuzzy rules for classification of a cut and intra-shot transitions so the time complexity of the algorithm is extremely high making it unsuitable for the real-time application. Instead of a single feature, many authors have proposed methods to combine multiple features [23] to detect the presence of abrupt cuts in a video. In all these shot boundary detection methods, the feature extraction phase is followed by the development of a suitable similarity metric to capture the temporal discontinuity. In many videos, sudden light variations like fire, flicker and explosion scenes occur frequently, which tend to increase the false detection error. Warhade et al. [11] have proposed a cut detection scheme to compensate for the errors developed as a repercussion due to the fire and flicker type of noise. Towards this goal, the authors have applied post-processing of stationary wavelet transform applied on cross-correlation coefficient sequence obtained between consecutive frames. This is then matched to the local and adaptive threshold for identifying the candidate cuts. Kar and Kanungo [13] proposed centre symmetric local binary pattern (CSLBP)-based cut detection to address the illumination variation issues in a video. They used block-based CSLBP feature and compared the Euclidean distance between the consecutive frame's feature vectors with a threshold value to detect the hard cuts. 2.1 Objectives and motivation A boundary between the shots is detected if the continuity value exceeds above or falls below a threshold. Most of the algorithms used thresholds which are not generalised and sensitive to the video content. The literature survey reveals that the following issues need to be taken care of for developing an efficient shot boundary detection algorithm. (i) The representation of the visual content of the frames using some suitable features such that the feature must be invariant to object motion, camera motion and unusual variation in the lighting condition. Moreover, the feature should remain approximately constant within a shot and should be able to capture the variation in the content during a transition between two shots in a video. (ii) Optimum threshold selection is extremely vital for maintaining a good trade-off between the recall and the precision measure. (iii) Again, most of the cut detection methods are unable to differentiate between shots having correlated visual content such as a gradual transition resulting in false positives. To ameliorate the above three major problems we have exploited the spatial correlation among the pixels in a frame as well as the temporal similarity between two consecutive frames. Recently, Chen et al. [24] developed a feature based on Weber's law known as Weber's law descriptor (WLD) which is inspired by the psychological behaviour of the human visual system. This WLD is an illumination insensitive feature and robust to rotation and scaling. At the same time, it is also capable to capture both motion and edge information [24]. Apart from the above strengths, WLD has low computational cost in comparison to SIFT and LBP features. In the literature, WLD feature has been applied efficiently in face detection and texture classification. However, it has not been applied earlier in video segmentation. All these characteristics of Weber's law-based feature motivated us to use it for modelling the visual content information in a scene and to develop a similarity index to detect hard cuts in a video which can handle light variations, object/camera motion more efficiently in comparison to conventional histogram or edge features. 3 Weber's feature Weber's law [25], states that the ratio of change of an excitement may be light/sound which is just perceptible to the background intensity, is a constant, i.e. (1)where is the change in excitement and I is the background intensity. If the change is less than the constant ratio of the original excitement it will be identified as a background noise rather than an authentic signal. In essence, it reveals that the human sense of identifying a pattern is a function of the original intensity of excitation and the change of intensity against a background. The two basic Weber's features developed by Chen et al. [24] are named as 'differential excitation' and 'gradient orientation'. We used these two features for the first time to model the visual content of a scene which is invariant to light variations as well as object and camera motion. 3.1 Quantised differential excitation In the differential excitation proposed by Chen et al. [24], they tried to capture the low-, medium- and high-frequency components by quantising the differential excitation value into six levels. They considered 2000 images to plot the distribution of differential excitation. From this, it is observed that there are two valleys which divide the entire distribution into three equal parts of radians. So, to capture the low-, medium- and high-frequency components we quantised the differential excitation value into three levels instead of six levels. The differential excitation component at the pixel coordinates is defined by the proportionate change of intensity of neighbouring pixels with respect to the centre pixel of a mask. generates a noticeable change in an image to reproduce the sense of pattern in an individual, which is defined as (2)where is the sum of the pixel difference value between the neighbouring pixel and the centre pixel value, defined as (3)where is the grey value at the pixel coordinates and the range of is between 0 and . The QDE component is the response of a three-level quantiser to the differential excitation component which is defined as (4)where k = 0, 1, 2. 3.2 Quantised gradient orientation The gradient orientation component at the pixel coordinates can be defined as the fractional change of the intensity in the x coordinate with respect to the y coordinate by considering only the first-order neighbouring pixel of the centre pixel at in a mask. Let us define and . Then is defined as (5)The range of is between and which is mapped to the range of 0 to . The QGO is the response of an eight-level quantiser to which is defined as (6)where l = 0, 1, 2, 3, …, 7. Considering these two features and , we have formulated a content similarity parameter between two consecutive frames which is discussed in the following section. 4 Content similarity parameter Content similarity parameter is a distance measure between two consecutive frames. In this section, we convert each frame into a unique feature vector known as Weber feature vector (WFV). WFV is invariant to light variation, camera effects and objects motion. The two common norms used to measure the similarity between two feature vectors (or image matching) are L1 and L2 norms. The L1 metric is also known as the mean absolute error (MAE), and the L2 metric is also known as the mean square error (MSE). It has been observed by Sinha and Russell [26] that MAE has highly consistent preference over the MSE-based matching. The MSE metric has the advantage of being continually differentiable. However, the MAE metric is less computationally expensive than MSE, as no multiplication is needed for its calculation. Hence using this MAE metric and WFV we developed a content similarity feature known as absolute sum Weber feature difference (ASWD) which generates a similarity index between two consecutive frames, very close to the ideal case. 4.1 Generation of WFV In the case of two different scenes, individual distributions of QDE or QGO may be close matching, whereas the joint distribution of QDE and QGO will differ. Therefore, at first, we evaluated the joint histogram of QDE and QGO and then mapped this joint histogram into a 24-element 1D histogram known as Weber feature vector (WFV). This WFV will represent the unique visual content of a particular scene. To illustrate the process of generating the WFV, we considered 5314th frame of the video NAD 58. The QDE and QGO feature frames of the original frame are placed in Figs. 1a and b, respectively. At first, using these feature images and , a 2D joint histogram is generated and placed in Fig. 1c. The has 24 cells and the value of a cell at coordinates represents the joint probability of and , i.e. . This H is mapped into WFV by concatenating the three rows of H as shown in Fig. 1d(7)where is the 2D joint histogram, is the WFV of length 24, k represents the frame number and N represents the total frames in a video. Fig. 1Open in figure viewerPowerPoint Illustration of WFV(a) QDE feature image,(b) QGO feature image,(c) 2D feature histogram,(d) 1D feature histogram(WFV) 4.2 Absolute sum WFV difference Each grey scale frame of size is mapped into its WFV : (8)The ASWD value between kth & frame and kth & frame is defined as and , respectively, which are evaluated as follows: (9) (10)Sudden flashlight affects a single frame in between two frames of the same scene. So, to address this problem, we considered both and for the evaluation of a hard cut. If and both are greater than a threshold TH, then there exists an abrupt cut at the kth time instant. The video segmented string is defined as follows: (11)where TH is a threshold and is a binary string of length N. Binary one represents an inter-shot boundary and binary zero represents intra-shot frames. A constant threshold may fail to compare the variation between and in a video which may not be optimised for different videos. Therefore, we proposed an adaptive threshold in Section 5, which is fully automatic and efficient to handle all kinds of videos. 5 Adaptive threshold The content of the video changes dramatically from shot to shot and also from video to video. Therefore, the value of the distance metric between two identical scenes will vary from one shot to other shot. Similar conclusion may be drawn for the distance metric of the boundary frames between different shots in a video. The distribution of ASWD value (scaled to 255), denoted as 'S' for the entire video 'anni 005', is plotted in Fig. 2a, which resembles a unimodal Gaussian distribution. Therefore, a global threshold is not suitable for partitioning this distribution into two classes (cut and non-cut). As per the study of Zhang et al. [17], <15% of the frames are shot boundaries in a video, which is very less in comparison to the number of non-cut frames. Fig. 2b shows the distribution of non-cut ASWD and cut ASWD in logarithmic scale for video 'anni 005'. The dotted line shows the distribution of non-cuts and the solid line shows the distribution of cuts. It is clearly observed that there is an overlapping of these two distributions from 20 to 45 in the scaled value of the ASWD. This overlapping can be reduced by considering the local statistical features instead of the global statistical feature. Hence, in this paper, we proposed an adaptive threshold based on local statistics of for automatic detection of shot boundaries in videos. The proposed adaptive threshold value at the kth frame is evaluated as follows: (12)where (13) (14) (15)Similarly, for last M frames (16)and is a constant. Fig. 2Open in figure viewerPowerPoint Analysis of adaptive threshold(a) Distribution of the ASWD for theentire video anni 005, (b) Distribution ofnon-cut and cut ASWD separately for the video anni 005,(c) Study of window lengthM A hard cut is declared at the kth frame if and . To restrict the variations, we considered for all the videos throughout our experiment. In our experiment, we found that at M = 16, the adaptive threshold is optimum in a sense of high measure, which is demonstrated in Fig. 2c. Fig. 2c shows the measure versus different values of M for four different videos selected from different classes. It is observed that at M = 16, is the highest for all these videos. Therefore, we have considered M = 16 for evaluation of adaptive threshold in our simulation. 6 Experimental results and discussions The proposed method for abrupt shot boundary detection is applied to 24 different videos consisting of 863,362 frames. All these videos are taken from well-known databases such as TRECVID 2001 [27], TRECVID 2002 [28], TRECVID 2007 [29], VIVA Research Lab [30] and internet. These videos are divided into two groups. The first group consists of 16 videos and the second group consists of 18 videos. Further, the first group is divided into seven subgroups named as documentary, movie, sitcom, cartoon, commercial, sports and news videos. The description of the first group videos (FGVs) and second group videos (SGVs) is presented in Tables 1 and 2, respectively. In the literature study, we found that few recent methods such as automatic video cut detection using fuzzy rule based approach (AVCDFRA) [22], CSLBP [13] and post processing of stationary wavelet transform (PPSWT) [11] are very strong in their performances to address the issues of the flashlight (sudden light variation). Moreover, the AVCDFRA method is good enough to handle objects and camera motion. Therefore, the performance of our proposed method is validated by comparing with these three state-of-the-art methods along with one of the traditional methods, i.e. ASHD [17]. Table 1. Description of the FGV and their ground truth data Category ID Video name Frames Transition Sources Cuts GT/FI/FO documentary D1 anni 004 3895 13 4 Open Video Project [27] D2 anni 005 11,200 38 26 TRECVID 2001 dataset D3 anni 006 8000 36 8 NASA 25th Anniversary D4 NAD 57 10,428 43 17 Airline Safety and Economy D5 NAD 58 6500 21 24 D6 Nature Wild Life Documentary Africa 3280 31 9 BBC movie M1 VideoAbstract 5132 39 0 VIVA Research Lab [30] M2 Transformer 4921 55 0 Transformer sitcom Sit The Big Bang Theory 14,200 153 3 The Big Bang Theory cartoon C1 Lisa 650 7 1 VIVA Research Lab Carleton C2 Kawa Soneka haar 3703 25 0 Youtube commercial Co1 COMMERCIAL 499 18 1 VIVA Research Lab [30] commercial Co2 Commercial2 235 0 3 sports S Soccer 4159 14 0 UEFA champions league news N1 Anderson Cooper reads 6280 8 8 CNN news N2 Woman jumps in front of car 1530 8 2 CNN total 16 videos 84,612 509 106 Table 2. Description of SGV and their ground truth data Video name ID Frames Transition Sources Cuts GT/FI/FO BG37998 V1 51,200 194 0 TRECVID 2007 [29] BG37795 V2 22,050 58 1 BG38422 V3 49,900 96 1 BG37721 V4 42,170 112 1 BG37613 V5 22,830 123 2 BG38438 V6 77,400 244 19 BG37399 V7 35,950 198 15 BG37808 V8 22,000 76 4 BG37968 V9 56,200 193 3 BG38174 V10 54,500 199 3 BG38655 V11 82,400 377 24 BG38117 V12 52,200 227 2 BG38903 V13 52,200 132 0 BG38183 V14 52,800 130 3 BG37796 V15 21,800 57 13 BG38423 V16 53,440 134 11 Annie Abandon Ghost V17 25,500 137 2 TRECVID 2002 [28] Fiscal Cliff V18 4210 23 4 18 videos 778,750 2710 108 Recall (R), precision (P) and are the three performance measures we considered to evaluate the strength of each method. Apart from these three measures, we also developed a new measure known as 'percentage change in mean feature value ()' to evaluate the strength of the similarity index feature of all these methods. The R, P and are defined as follows: (17) (18) (19)where Hit is the number of true shot boundaries detected, Miss is the number of missed cuts and Fhit is the number of false cuts detected by the algorithm. At first, the performance measures R, P and of our proposed method for the FGV in Table 1 are evaluated considering global threshold and local threshold separately, and tabulated in Table 3. From Table 3, it is clearly observed that the performance with a local threshold is better than performance with a global threshold in terms of higher average recall, precision and measures. It is observed that there is an improvement of 25.31, 3.76 and 19.89% in average R, P and measures by introducing the local statistical parameters instead of global statistical parameters. Table 3. Performance measures of a proposed ASWD feature on FGV with a global threshold and local threshold Video ID Performance with global threshold Performance with local threshold R P R P D1 84.62 100 91.67 100 100 100 D2 66.67 96.30 78.79 97.44 86.36 91.57 D3 88.89 96.97 92.75 97.22 97.22 97.22 D4 90.48 92.68 91.57 97.62 91.11 94.25 D5 90.48 79.17 84.44 100 91.30 95.45 D6 70.97 100 83.02 88.57 93.94 91.18 M1 60 100 75 90 100 94.74 M2 12.73 43.75 19.72 74.55 93.18 82.84 Sit 26.80 82 40.39 100 95.63 97.76 C1 33.33 100 50 100 100 100 C2 80 100 88.89 100 96.15 98.04 Co1 27.78 100 43.48 61.11 91.67 73.33 Co2 100 100 100 100 100 100 S 100 100 100 100 100 100 N1 100 53.33 69.57 100 72.73 84.21 N2 50 100 66.67 87.50 87.50 87.50 average 67.67 90.26 73.49 92.98 94.02 93.38 It is observed that for the documentary videos (D1–D6) the lowest and the highest values of are 91.18 and 100, respectively. Similarly, for M1 video the is 94.74, whereas it is 82.84 for M2 video. The is less for M2 video because of the lower value of R which implies it has few missed cuts, but it is better in terms of false cuts as P is 93.18. In the case of the video Sit, R = 100, P = 95.63 and F1 = 97.76. Although the performance of the proposed method is appreciable for most of the videos it is not up to the mark for 4 videos (Co1, M2, N1 and N2) out of 16 test videos of FGV. However, overall it has an average R, P and of 92.98, 94.02 and 93.38, respectively. The group wise average R, P and are evaluated for ASHD, AVCDFRA, CSLBP, PPSWT and the proposed method is tabulated in Table 4. It is observed from Table 4 that our proposed method has the highest average R, P and for sitcom, cartoon, sports and news video group. In the case of commercial video group, AVCDFRA has the highest average R, P and values and for movie group, AVCDFRA has the highest R and values. For documentary video group, PPSWT has the highest average R values and for movie video group, PPSWT has the highest P values. In terms of total average, our proposed method has the highest R, P and measures. Moreover, in our method, the parameters used for the evaluation of automatic threshold are same for all kinds of videos. However, in the PPSWT method, the parameters and given by the authors are not optimum for all kinds of videos. Therefore, to make a fair comparison, we selected the optimum values of the parameters empirically, for individual videos. We found that PPSWT has high false cuts and our proposed method produced very low missed cuts and low false cuts resulting in high measure. Table 4. Average performance measures of different methods on FGV Category ASHD [17] AVCDFRA [22] CSLBP [13] PPSWT [11] Proposed ASWD R P R P R P R P R P documentary 64.24 73.72 66.47 77.12 70.26 72.42 86.15 91.20 87.37 99.53 82.07 89.58 97.26 94.27 95.67 movie 22.68 48.48 30.44 91.70 97.17 94.34 47.61 63.81 53.21 89.38 99.05 93.86 82.28 96.59 91.42 sitcom 27.45 84 41.38 96.08 90.18 93.04 76.47 92.13 83.57 94.12 94.74 94.43 100 95.63 97.76 cartoon 81.33 91.38 84.44 83.34 88.07 85.39 52.34 69 56.50 90 86.15 88 100 98.1 99.02 commercial 63.89 100 71.74 88.89 100 94.1 61.11 100 68.18 64.70 100 72.72 80.55 95.83 86.66 sports 85.71 100 92.31 85.71 85.71 85.71 92.86 81.25 86.67 92.86 86.67 89.66 100 100 100 news 68.75 30.52 37.80 68.75 73.21 70.83 68.75 57.50 53.57 91.66 51.10 61.99 93.75 80.12 85.86 total average 59.15 75.45 60.65 84.51 86.37 85.12 69.32 79.27 69.87 88.89 85.68 84.32 93.4 94.37 93.77 Further, for demonstrating the strength of our proposed method, we considered the ASWD values and the adaptive threshold values of video D1, which is represented in Fig. 3a. The sharped peaks in the ASWD values represent the hard cuts in video D1. The regions marked inside ellipse show the range of values of the gradual transition frames. The adaptive threshold values are plotted in blue colour line. It is clearly observed from Fig. 3a that the proposed threshold is capable enough in detecting the true hard cuts and at the same time rejecting the gradual transitions. Similarly, Fig. 3b shows the final detected hard cut positions and their values based o

Referência(s)
Altmetric
PlumX