An image denoising algorithm based on adaptive clustering and singular value decomposition
2021; Institution of Engineering and Technology; Volume: 15; Issue: 3 Linguagem: Inglês
10.1049/ipr2.12017
ISSN1751-9667
AutoresPing Li, Hua Wang, Xuemei Li, Caiming Zhang,
Tópico(s)Advanced Image Processing Techniques
ResumoIET Image ProcessingVolume 15, Issue 3 p. 598-614 ORIGINAL RESEARCH PAPEROpen Access An image denoising algorithm based on adaptive clustering and singular value decomposition Ping Li, Ping Li School of Computer Science and Technology, Shandong University, Jinan, ChinaSearch for more papers by this authorHua Wang, Hua Wang School of Computer Science and Technology, Shandong University, Jinan, ChinaSearch for more papers by this authorXuemei Li, Xuemei Li School of Information and Electrical Engineering, Ludong University, Yantai, ChinaSearch for more papers by this authorCaiming Zhang, Corresponding Author Caiming Zhang czhang@sdu.edu.cn School of Software, Shandong University, Jinan, China Shandong Co-Innovation Center of Future Intelligent Computing, Yantai, China Digital Media Technology Key Lab of Shandong Province, Jinan, China Correspondence Caiming Zhang, School of Software, Shandong University, Jinan 250101, China. Email: czhang@sdu.edu.cnSearch for more papers by this author Ping Li, Ping Li School of Computer Science and Technology, Shandong University, Jinan, ChinaSearch for more papers by this authorHua Wang, Hua Wang School of Computer Science and Technology, Shandong University, Jinan, ChinaSearch for more papers by this authorXuemei Li, Xuemei Li School of Information and Electrical Engineering, Ludong University, Yantai, ChinaSearch for more papers by this authorCaiming Zhang, Corresponding Author Caiming Zhang czhang@sdu.edu.cn School of Software, Shandong University, Jinan, China Shandong Co-Innovation Center of Future Intelligent Computing, Yantai, China Digital Media Technology Key Lab of Shandong Province, Jinan, China Correspondence Caiming Zhang, School of Software, Shandong University, Jinan 250101, China. Email: czhang@sdu.edu.cnSearch for more papers by this author First published: 20 January 2021 https://doi.org/10.1049/ipr2.12017AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract Self-similarity, a prior of natural images, has attracted much attention. The attribute means that low-rank group matrices can be constructed from similar image patches. For low-rank approximation denoising methods based on singular value decomposition (SVD) the ability to accurately construct group matrices with noise and handle singular values are keys. Here, combining image priors, a two-stage clustering method to adaptively construct group matrices is designed. The method is anti-noise, that is, when noise levels are high, these matrices are more accurate than that constructed by other algorithms. Then, according to the significance of singular values and singular vectors, singular vectors of the low-rank estimations are corrected so that the residual noise in the low-rank estimations is further suppressed. For back projection , the authors use the original noise level and the residual image to adaptively determine projection parameters and new noise levels . So, authors' back projection can provide a good foundation for authors' two-stage denoising methods, better remove noise and preserve image details. Experimental results show that compared with the existing state-of-the-art denoising algorithms, the proposed algorithm achieves competitive denoising performances in terms of quantitative metrics and preserving details. Especially with the increase of noise, the competitiveness of authors' algorithms is gradually enhanced. 1 INTRODUCTION Image denoising is an important pre-processing process in the fields of computer vision and image processing. It has been widely studied in the past decade [1-14]. Many applications have high requirements for image denoising, such as visual enhancement, medical diagnosis [15] and target recognition [16, 17]. In general, image denoising algorithms can be divided into two categories. The first type of algorithms exploit the inherent prior attributes of images in the spatial domain, such as [8, 18, 19]. The removal of salt and pepper noise is also a typical application in the spatial domain. Thanh et al. [20] propose an adaptive total variation (TV) regularization model. Erkan and Gökrem [21] propose a based on pixel density filter (BPDF) and Erkan et al. [22] propose a different applied median filter (DAMF) to remove salt and pepper noise. Enginoğlu et al. [23] propose a adaptive Riesz mean filter (ARmF), by operationalizing pixel similarity for salt and pepper noise removal. Erkan et al. [24] propose an iterative mean filter (IMF) for salt and pepper denoising. Unlike other non-linear filters, IMF does not enlarge the window size and it only uses a window with a size of 3 × 3 . Mamaev et al. [25] propose a new method to find the parameters of a non-linear diffusion denoising method by ridge analysis. Jin et al. [26] establish convergence theorems for the non-local means (NLM) filter in removing the additive Gaussian noise.Cheng et al. [27] propose to model the gradient distribution of natural images as spatially variant hyper-Laplacian and the model is free from tedious tuning trade-off parameters. Following the NLM approach, Jin et al. [28] propose an adaptive estimator based on the weighted average of observations taken in a neighbourhood with weights depending on the similarity of local patches. The second type of algorithms are based on transform domain, including wavelet transform, dictionary learning (DL), principal component analysis (PCA) and so on. Bayesian least squares with Gaussian scale-mixture (BLS-GSM) [29] is a typical case of wavelet algorithms. However, fixed global wavelet bases cannot accurately represent image structure and often introduce pseudo textures into the denoised images. To overcome the disadvantages, denoising algorithms based on DL use a set of adaptive redundant representation bases called a dictionary to make image representations more flexible and sparse. The most typical dictionary-based denoising algorithm is K-SVD [2], which is a generalization of the k-means algorithm. PCA is another dimension reduction technique that is widely used in denoising algorithms. Muresan and Parks [30] use PCA and linear minimum mean square error (LMMSE) to denoise overlapping patches. Using Marchenko–Pastur law (MP-law) based on random matrix theory, Zhao et al. [31] proposed the adaptive patch clustering with progressive PCA thresholding (ACPT) algorithm to improve the performance of clusterwise PCA threshold denoising algorithm. Zhang et al. [32] further improved the performance by adding a patch selection mechanism. Advanced denoising algorithms often combine several denoising techniques. There are some advanced denoising algorithms based on non-local image patches [1-3, 5, 29]. Prior to this, there have been various neighbourhood filtering techniques based on variational and partial differential equations, but their performances are not as good as [1-3, 5, 29]. Some algorithms use the prior information, such as the redundancy and sparse representation of images in the transform domain, to improve the algorithms [1-3, 5, 29]. It turns out that these priors can effectively improve the performance of denoising algorithms. Variational regularization methods are widely used for removing noise without destroying edges that are important visual cues. Pang et al. [33] propose an adaptive weighted T V p regularization-based denoising model. Liu et al. [34] present a general Retinex model to effectively and robustly restore images degenerated by both illusion and noises and proposes a novel variational model by incorporating appropriate regularization technique for the reflectance component and illumination component accordingly. Thanh et al. [35] propose an adaptive image restoration method based on a combination of first- and second-order TV regularizations with an inverse-gradient-based adaptive parameter. Due to the adaptive parameter estimation based on the inverse gradient, it avoids the staircasing artefacts associated with TV regularization and its variant models. Prasath and Thanh [36] provide an adaptive version of the TV regularization model that incorporates structure tensor eigenvalues for better edge preservation without creating blocky artefacts associated with gradient-based approaches. Block-matching and 3D filtering (BM3D) [3] is widely regarded as one of the most effective denoising algorithms. Since the geometric features of natural images can be sparsely decomposed in some suitable transform domains, BM3D uses a pre-set dictionary to cooperatively filter three-dimensional image groups composed of non-local similar image patches and achieves enhancement of sparse expression. Although the quantitative metrics of BM3D achieve good results and the execution time of the algorithm is extremely short, it uses a pre-set dictionary to make its sparse expression limited. Many pseudo textures visible to the naked eyes are often introduced in the denoised results of BM3D. LSSC [5] avoids the drawback of using a pre-set dictionary. An improved BM3D filter [37] applies PCA to the shape-adaptive patch groups in the spectral contracted 3D transform domain an improved BM3D filter which exploits adaptive-shape patches and principle component analysis and improves the BM3D's detail retention capability. However, these algorithms do not protect the irregular textures in the image and introduce pseudo textures. Shi et al. [38] enhances structural details in image restoration by regarding global boundary context and residual context as complimentary information. Except redundant priors and sparse priors, low-rank priors are also effectively utilized in a variety of computer vision applications, such as image segmentation, facial recognition and especially image denoising [7, 8, 18, 19, 39]. One of the low-rank priors is to construct matrices by stacking non-local similar patches in a given noisy image, which are in a low-dimensional subspace of a high-dimensional space that satisfies the low-rank criterion. Gu et al. [8] use the weighted kernel norm minimum (WNNM) to exploit the low-rank prior. They combine the meaning and importance of singular values and treat them differently according to their respective values. Guo et al. [19] proposed the LRA-SVD (low-rank approximation, LRA) algorithm, which finds L image patches that are most similar to the target image patch in the neighbourhood window centred on the target image patch to construct similar patch matrices and uses the way of singular value truncation to obtain the low-rank estimations of the similar patch matrices. LRA-SVD achieves effective denoising performances, but it also lacks protection for image textures. Based on the above problems, we propose an image denoising algorithm based on adaptive clustering and SVD. The main contributions are as follows: (1) We propose an adaptive clustering method with better anti-noise ability. It can classify image patches and construct low-rank similar patch matrices adaptively and accurately. The size of image patches is the only parameter that can affect the number of clusters in the clustering method. Based on the idea of divide and conquer, this paper first divides the image patches into four categories according to the colour features, spatial features, shape features and texture features of the image features, and it further subdivides the image patches in each category to obtain low-rank matrices to be denoised. (2) In this paper, we consider the assumption of the singular value truncation is not rigorous, that is, the assumption that image information is concentrated on large singular values and noise information is concentrated on the remaining small singular values is not rigorous. After performing SVD low-rank approximation on similar patch matrices, we correct the singular vectors of similar patch matrices to further suppress the residual noise in the low-rank approximation matrices. (3) Based on the intuitive information and the implicit information contained in residual images and noise levels, we reasonably define a calculation method of the projection parameter and the noise level to be estimated in the back projection, so that they can be adaptive solutions based on the image information. Experimental results show that the proposed algorithm is superior to some existing denoising techniques. The rest of the paper is structured as follows. In the second part, we briefly review and introduce some basic knowledge involved in our algorithm. In the third part, the proposed algorithm is introduced in detail. In the fourth part, the effectiveness of the proposed algorithm is verified by comparing the proposed denoising algorithm with other advanced denoising algorithms. Finally, we summarize the proposed algorithm and indicate the direction of future work. 2 DEPENDENT WORK Here, we introduce some previous theories or algorithms that our method depend on. In particular, we introduce more about LRA-SVD [19], because our method is mainly improved on the basis of LRA-SVD. Our method is to remove additive white Gaussian noise (AWGN). The AWGN is usually introduced during image acquisition. Restoring images corrupted by AWGN has been a hot research in the field of image denoising and this kind of denoised problems are always described as Y = X + E , (1)where Y is the noisy observation image, X is the original noise-free image and E is the white Gaussian noise with a mean of 0 and a standard deviation of τ. For restoring the original images from the noisy versions of the image as accurately as possible and preserving the edges, details and other important information of images, it is important to choose a noise model and prior information in natural images. 2.1 Singular value decomposition For the matrix Y, the singular value decomposition theorem [40] shows that the singular value decomposition of Y can be expressed as Y = U Y Σ Y V Y T = U Y 1 U Y 2 Σ Y 1 0 0 Σ Y 2 V Y 1 T V Y 2 T , (2)where U Y 1 ∈ R m × k , U Y 2 ∈ R m × ( m − k ) , V Y 1 ∈ R n × k , V Y 2 ∈ R n × ( n − k ) . Both U Y = ( u 1 , … , u m ) ∈ R m × m and V Y = ( v 1 , … v n ) ∈ R n × n are orthogonal columns, i.e. U Y U Y T = V Y V Y T = I . U Y and V Y are called left singular matrix and right singular matrix of Y, respectively. The diagonal elements of Σ Y are non-negative and descending, Σ Y 1 = d i a g ( σ 1 , … , σ k ) ∈ R k × k and Σ Y 2 = d i a g ( σ k + 1 , … , σ n ) ∈ R ( m − k ) × ( n − k ) . If Y is low rank and k is the magnitude of its rank, then k < n , Σ Y 2 = d i a g ( 0 , 0 , … , 0 ) . When Y is a matrix composed of similar image patches, it is generally considered to be low rank. 2.2 Image patch grouping Non-local self-similarity characterizes the non-local textures or structures of images with a recurring characteristic. Y, including AWGN with a mean of 0 and a standard deviation of τ, is a noisy image. Image patches of size d × d are extracted from Y with a certain step size and each image patch is vectorized to obtain p i . In [19], the similarity between two patches is measured by the simplest Euclidean distance in the spatial domain. The spatial Euclidean distance between two patches is defined by s p i , p c = p i − p c 2 2 , (3)where ∥ • ∥ 2 represents the Euclidean distance, p i is the candidate patch, p c is the reference patch. Subscripts i and c are both the indexes of the image patches. p i denotes ith image patch and p c denotes cth image patch. The smaller s ( p i , p c ) is, the more similar p i and p c are. The L image patches most similar with p c , denoted as { p c , j } j = 1 L , construct the low-rank similar patch matrix P c . Each image patch p c , j is a column of P c . The corresponding similar patch matrix P c can be expressed as P c = p c , 1 , p c , 2 , … , p c , L . (4)In general, the number L of similar patches in a similar patch matrix is too small, which means that there are not enough patches in each similar patch matrix and the robustness of the SVD denoising algorithm is not good. While a too large L causes the dissimilar patches to be grouped together, so that the correlation between the columns in the similar patch matrix is low, thereby making the low-rank estimation inaccurate. So we improve the process of image patch grouping by our two-stage clustering, where the parameter L is adaptively decided based on image information rather than artificially set. 2.3 Eckart–Young–Mirsky theorem As shown in (4), since P c is composed of noisy image patches, it can be expressed as P c = Q c + N c , (5)where Q c represents a noise-free similar patch matrix and N c represents the noise contained in P c . P c is the object to be denoised. In the latter part of the paper, for simplicity of description, P c , Q c and N c are replaced by P, Q and N, respectively. The task of denoising the similar patch matrix P is to estimate the noise-free similar patch matrix Q ′ as accurately as possible from P. Ideally, the estimated Q ′ should satisfy P − Q ′ F 2 = t τ 2 , (6)where ∥ • ∥ F is the Frobenius norm, τ is the standard deviation of the noise and t is the number of pixels in the matrix P. The similarity between image patches in the noise-free image X gives a high correlation between these patches, that is, Q is a low-rank matrix, which has been confirmed in many algorithms, such as [19, 41]. In the least squares sense, the estimation of Q can be obtained by low-rank approximation, that is, by solving the optimal problem of (7), Q ′ can be estimated from P. Q ′ = arg min B P − B F 2 s . t . r a n k B = k , (7)where rank ( • ) represents the rank of matrix B. P can be expressed as (8) after singular value decomposition. P = U Σ V T . (8)Let P k = U Σ ′ V T , (9)where Σ ′ is obtained from the matrix Σ by setting the diagonal elements to zeros but the first k singular values, i.e. Σ ′ = d i a g σ 1 , σ 2 , … , σ k , 0 , … , 0 . (10) The Eckart-Young-Mirsky theorem [42, 43] gives that (9) is the solution of (7). The Eckart-Young-Mirsky theorem also states that for any real matrix P, if the rank of the matrix Q is k, then P − Q F 2 ≥ ∑ i = k + 1 n σ i 2 , (11)where σ i ( i = 1 , … , n ) are the singular values of P, and equality is attained when Q = P k is defined by (9). From the theorem, we can get Q ′ = P k . (12)In particular, the Q ′ obtained when ∑ i = k + 1 n σ i 2 = t τ 2 is the most ideal estimation [19]. By using the above theories and conclusions, Guo et al. derive an effective singular value threshold k of singular value truncation. The threshold k is calculated as follows: ∑ i = k n σ i 2 > τ 2 > ∑ i = k + 1 n σ i 2 . (13) Our method continues to use this conclusion. But, the way of singular value truncation assumes that the signal energy is completely concentrated on the first few large singular values, while the noise energy is concentrated on the rest of singular values. Because the energy of noise is also distributed over large singular values and the energy of signal can be distributed over small singular values, the assumption is not rigorous. Therefore, the performances of these denoising algorithms are deficient when the noise level is high. So in our method, we propose a process of singular vector correction after singular value truncation. It can make up for the lack of the assumption to some extent. 2.4 Back projection Usually, back projection mechanism [44, 45] is used to iterate the basic processes to enhance the denoising performance of algorithms. Its basic idea is to generate a new noisy image by adding filtered noise back to the denoised image, i.e. Y ̂ = X ̂ + δ Y − X ̂ , (14)where δ ∈ ( 0 , 1 ) is a constant projection parameter and X ̂ is the denoised image produced by the previous stage. In LRA-SVD [19], δ = 0.5 is artificially assigned and fixed. While the parameter δ in our method is adaptively determined based on image information. Then, the noise standard deviation τ ̂ in the new noisy image should be updated in the next denoising stage. In LRA-SVD, τ ̂ is written as τ ̂ = γ τ 2 − ∥ Y − X ̂ ∥ F 2 , (15)where γ is a scaling factor. In our method, the calculation of τ ̂ is improved and more use of image information. It makes our method perform better in detail protection. 2.5 Quantitative metrics and test images Peak signal to noise ratio (PSNR) is often inconsistent with the human eye perception, but it is the mostly widely used quality measurement in the literature. The calculation of PSNR is as formula (16), P S N R ( x , x ̂ ) = 10 log 10 m n [ m a x ( x ( i , j ) ) ] 2 Σ i = 1 m Σ j = 1 n [ x ̂ ( i , j ) − x ( i , j ) ] 2 , (16)where m is the number of rows and n is the number of columns in the image. x ( i , j ) represents the value of the pixel at position ( i , j ) in an image without noise. x ̂ ( i , j ) represents the value of the pixel at position ( i , j ) in a denoised image. However, image details (low-level features) are very important when the visual system acquires information. Feature-Similarity (FSIM) index [46] is a new image quality metric based on low-level features in images, and it measures the similarity between two images by phase congruency (PC) and gradient magnitude (GM). FSIM demonstrates a high degree of consistency with human vision through subjective visual assessment. Here, we do a brief introduction to the calculation of FSIM. Let f 1 and f 2 denote the two images to be compared. P C 1 ( x ) and P C 2 ( x ) are the PC of f 1 ( x ) and f 2 ( x ) at position x, respectively. The similarity measure for P C 1 ( x ) and P C 2 ( x ) is defined as follow: S P C ( x ) = 2 P C 1 ( x ) · P C 2 ( x ) + T 1 P C 1 2 ( x ) + P C 2 2 ( x ) + T 1 , (17)where T 1 is a positive constant to increase the stability of S P C . G 1 ( x ) and G 2 ( x ) are the GM of f 1 ( x ) and f 2 ( x ) , respectively. The similarity of G 1 ( x ) and G 2 ( x ) is defined as follows: S G ( x ) = 2 G 1 ( x ) · G 2 ( x ) + T 2 G 1 2 ( x ) + G 2 2 ( x ) + T 2 , (18)where T 2 is a positive constant depending on the dynamic range of GM values. Then, S P C ( x ) and S G ( x ) are combined to get the similarity S L ( x ) of f 1 ( x ) and f 2 ( x ) . The S L ( x ) is defined as follows: S L ( x ) = [ S P C ( x ) ] α · [ S G ( x ) ] β , (19)where α and β are parameters used to adjust the relative importance of PC and GM features. The FSIM index between f 1 and f 2 is defined as follows: F S I M = Σ x ∈ Ω S L ( x ) · P C m ( x ) Σ x ∈ Ω P C m ( x ) , (20)where Ω means the whole image spatial domain. P C m ( x ) = m a x ( P C 1 ( x ) , P C 2 ( x ) ) , is used for weighting the importance of S L ( x ) in the overall similarity between f 1 and f 2 . More details can be found in [46]. In this paper, we use PSNR index and FSIM index to fully reflect the performance of the denoising algorithm. And the parameters used to calculate FSIM are the same as the settings in [46]. In our experiment, natural greyscale images with size 256 × 256 are used for simulation. These images have been commonly used to verify the effectiveness of many state-of-the-art denoising algorithms. The noisy images are generated by adding zero mean white Gaussian noise with different standard deviations to the test images. The noise standard deviation τ ranges from 30 to 150. Greyscale images used in the experiment are shown in Figures 1 and 2. To prevent accidental data, we also test each algorithm on the BSD100 database. Since the BSD100 is a colour image set and our method depend on colour features, we directly add the same level of noise to the RGB channels and remove them, respectively. FIGURE 1Open in figure viewerPowerPoint Greyscale images used in the experiment FIGURE 2Open in figure viewerPowerPoint Greyscale images used in the experiment 3 INTRODUCTION OF THE METHOD PROPOSED IN THIS PAPER Non-local self-similarity is a very important feature of natural images. It characterizes the non-local textures or structures with a repetitive feature, which can be used to effectively maintain edges and details of images. According to the non-local self-similarity of images, we propose an image denoising algorithm that effectively removes AWGN and achieves better denoising performances. The basic process of the denoising algorithm is to divide the image into patches firstly. Next step, it uses the method of clustering and the similarity between image patches to adaptively construct similar patch matrices. Then singular values and singular vectors obtained by the singular value decomposition of the similar patch matrices are processed to remove the noise. Finally, the denoised similar patch matrices are reconstructed to obtain the denoised image. Combined with back projection technology, the algorithm iterates the basic process to achieve two-stage denoising of images and further improves the performances of the algorithm. The complete algorithm flowchart is shown in Figure 3, where the basic processes of a single iteration are shown in the dashed box in Figure 3. FIGURE 3Open in figure viewerPowerPoint Algorithm flowchart 3.1 Adaptive clustering The essence of utilizing the non-local self-similarity is to find similar image patches and construct low-rank similar patch matrices. Cluster analysis is a multi-variate statistical method for classifying research objects according to certain characteristics, and it has been successfully applied to economics, medicine, meteorology and other fields. It does not care about the causality relationship between features and variables. The result of clustering shows that the individual differences between categories are large, while the individual differences of the same category are relatively small. The difference between clustering and classification is that the number of classes divided by the clustering method is unknown. This unknown provides a feasible space for adaptively dividing image patches according to image content. An existing classical method of constructing similar patch matrices is to find L image patches that are most similar to the target image patch in the neighbourhood window centred on the target image patch, as described in LRA-SVD [19]. The size of L is crucial for ranks of the similar patch matrices and denoised results. The similar image patches are searched in a fixed-size search window centred on the target image patch. When search windows of different target patches overlap, the same image patch can be similar to multiple image patches and appear in multiple similar patch matrices. About other algorithms, such as the traditional k-means clustering algorithm, it cannot adaptively determine the number of clusters. However, a fixed number of clusters results in multiple image features in the same cluster, which makes it difficult to identify some features after noise removal. According to the divide-and-conquer k-means algorithm [47], ACPT [31] uses the over-clustering-and-iterative-merging strategy to obtain the final clustering results and improves the clustering method. Usually, when a cluster corresponds to an image feature in a small range, the cluster is considered to be a better and more accurate partition. However, the iterative-merging in [31] overly merges some subtle different clusters representing different details into one class, so that the low-rank feature of the similar patch matrices is not guaranteed and the image details are lost in the denoising phase. At the same time, the initial state of the cluster is also crucial for the clustering results. Inspired by the above methods, we propose an adaptive two-stage clustering algorithm to classify image patches in the whole image range, so that the low-rank feature of similar patch matrices is guaranteed and improved. The adaptive two-stage clustering algorithm makes the features of image patches contained in each cluster as consistent as possible. In the first stage, s
Referência(s)