Forensic detection of image manipulation using the Zernike moments and pixel‐pair histogram
2013; Institution of Engineering and Technology; Volume: 7; Issue: 9 Linguagem: Inglês
10.1049/iet-ipr.2012.0717
ISSN1751-9667
AutoresMahmood Shabanifard, Mahrokh G. Shayesteh, Mohammad Ali Akhaee,
Tópico(s)Image Processing Techniques and Applications
ResumoIET Image ProcessingVolume 7, Issue 9 p. 817-828 ArticleFree Access Forensic detection of image manipulation using the Zernike moments and pixel-pair histogram Mahmood Shabanifard, Mahmood Shabanifard Department of Electrical Engineering, Urmia University, Urmia, IranSearch for more papers by this authorMahrokh G. Shayesteh, Corresponding Author Mahrokh G. Shayesteh m.shayesteh@urmia.ac.ir Department of Electrical Engineering, Urmia University, Urmia, Iran Wireless Research Laboratory, ACRI, Electrical Engineering Department, Sharif University of Technology, Tehran, IranSearch for more papers by this authorMohammad Ali Akhaee, Mohammad Ali Akhaee School of Electrical and Computer Engineering, Faculty of Engineering, University of Tehran, Tehran, IranSearch for more papers by this author Mahmood Shabanifard, Mahmood Shabanifard Department of Electrical Engineering, Urmia University, Urmia, IranSearch for more papers by this authorMahrokh G. Shayesteh, Corresponding Author Mahrokh G. Shayesteh m.shayesteh@urmia.ac.ir Department of Electrical Engineering, Urmia University, Urmia, Iran Wireless Research Laboratory, ACRI, Electrical Engineering Department, Sharif University of Technology, Tehran, IranSearch for more papers by this authorMohammad Ali Akhaee, Mohammad Ali Akhaee School of Electrical and Computer Engineering, Faculty of Engineering, University of Tehran, Tehran, IranSearch for more papers by this author First published: 01 December 2013 https://doi.org/10.1049/iet-ipr.2012.0717Citations: 12AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract Integrity verification or forgery detection of an image is a difficult procedure, since the forgeries use various transformations to create an altered image. Pixel mapping transforms, such as contrast enhancement, histogram equalisation, gamma correction and so on, are the most popular methods to improve the objective property of an altered image. In addition, fabricators add Gaussian noise to the altered image in order to remove the statistical traces produced because of pixel mapping transforms. A new method is introduced to detect and classify four various categories including original, contrast modified, histogram-equalised and noisy images. In the proposed method, the absolute value of the first 36 Zernike moments of the pixel-pair histogram and its binary form for each image in the polar coordinates are calculated, and then those features that yield the maximum between-class separation, are selected. Some other features obtained from Fourier transform are also utilised for more separation. Finally, support vector machine classifier is used to classify the input image into four categories. The experimental results show that the proposed method achieves high classification rate and considerably outperforms the previously presented methods. 1 Introduction In recent decades, with growing digital imaging devices, digital images have been widely used in different aspects of society such as governmental, journalism, lawful or legal, and so on. One of the major problems of using digital images is that they can be easily modified using powerful editing image applications such as Photoshop; hence it is difficult to distinguish fake images from the genuine ones. The goals of forensic detection fall into the following categories [1]: (i) source classification, (ii) device identification, (iii) device linking groups objects according to their common source, (iv) integrity verification or forgery detection, (v) processing history recovery and (vi) anomaly investigation. In this paper, we focus on the forgery detection and identify the process implemented on the input image such as contrast enhancement, histogram equalisation and noise addition. Most of the earlier works in this field have been devoted to disclose digital forgeries based on the incongruity in the chromatic aberration [2] by estimating the two-dimensional (2D) aberration-based model parameters and inconsistencies in the lighting angle of local regions in the images [3]. In [1], a new method was proposed based on photo response non-uniformity (PRNU) noise as the intrinsic fingerprint of imaging sensors along with maximum likelihood criterion to discriminate original images from the altered ones. In [4], the authors introduced a new method based on the absence or presence of the colour filter array interpolation-produced correlations of test images. These correlations would be destroyed when the original or genuine images were modified. Another method was presented in [5] to estimate the rotation angle and rescaling factor of local regions which are copy–moved in the altered image, based on the interpolation effect of the geometric operations. In [6], a new complex Zernike phase-based descriptor for region matching and local image representation has been introduced. However, this descriptor could not identify pixel value mapping-based transformations. In [7, 8] proposed a new scheme to detect the copy–move forgery in digital images using scale-invariant feature transform (SIFT) descriptors. The SIFT-based descriptors of an image are invariant to changes in scaling, rotation, illumination and so on. In [9], a new scheme was proposed based on the statistical traces produced by pixel value mapping applications called intrinsic fingerprints; however, the approach has two major drawbacks: (i) its performance degrades when the size of sub-images decreases; and (ii) the work is just organised into two-class method such as original and contrast-enhanced images or original and histogram-equalised images. The study in [10] exploited pattern noises which are the combination of PRNU and fixed pattern noise; camera identification algorithm was introduced based on extracting the reference pattern for each camera using correlation detector. For each camera under investigation, the authors first determine its reference pattern noise, which serves as a unique identification fingerprint. This is achieved by averaging the noise obtained from multiple images using a denoising filter. To identify the camera from a given image, the reference pattern noise is considered as a spread spectrum watermark, whose presence in the image is established using a correlation detector. In [11], the author utilised PRNU parameter for forensic purposes. The work in [12] used double compression detection in JPEG images to detect manipulated images from a set of given images. In [13], a new method was introduced to detect image forgery based on the JPEG headers information such as quantisation table, Huffman codes, thumbnails and exchangeable image file (EXIF) format as the cameras signature. The authors in [14] calculated the second derivative of image in the frequency domain and used the normalised energy density criterion within different window sizes to detect resampled images from the original ones. They generated 19-D feature vectors employed to train a support vector machine (SVM) classifier. In [15], a new scheme was proposed to detect the copy–move forgery within an image. The performance of the previously proposed feature sets and the selected 15 most important feature sets were evaluated. The authors used a variety set of processing as the post-processing step such as matching, filtering, outlier detection, affine transformation estimation and so on. In [16] singular value decomposition (SVD) was exploited to decompose the RGB layer of an image into three rotation-invariant orthogonal matrices. In order to protect image against the forgery, 1D-cellular automata was used to generate the robust secret key. Then, the key was embedded into the spatial domain of another RGB layer to authenticate the original image. In this paper, we present a new method to detect and distinguish pixel mapping transforms such as contrast enhancement, histogram equalisation and Gaussian noise addition to the images. In the first step, we divide an image into sub-images and then we remove low detail sub-images using entropy criterion to attain more accuracy in our classification. Next, the pixel-pair histogram of R, G and B components of each sub-image is calculated using different ordering patterns. The total pixel-pair histogram of each sub-image is calculated by peer-to-peer summation of the components of the obtained pixel-pair histograms. We also define binary pixel-pair histogram for each sub-image. Moreover, we calculate the absolute values of complex Zernike moments (ZMs) of the pixel-pair histograms of each sub-image in polar coordinates. Finally, SVM classifier is applied to classify the input feature set. In this study, we use Photoshop application to generate contrast-modified sub-images and MATLAB software to generate noisy and histogram-equalised ones. The results demonstrate the efficiency of the proposed method. The rest of this paper is organised as follows: In Section 2, we explain the pixel-pair histogram of an image and complex ZM transform. In Section 3, we present the proposed method. The experimental results are provided in Section 4. Finally, Section 5 gives the concluding remarks. 2 Preliminaries In this section, we briefly describe pixel-pair histogram and its corresponding binary form for an image. Then, we explain the FLAF function of an image used in this paper and finally, the ZMs will be described. 2.1 Pixel-pair histogram and its binary form In [9], the authors used the histogram of sub-images and proved that if the image is changed under pixel mapping transforms, the energy of high-frequency components of the histogram would increase. One disadvantage of using histogram as the first-order statistic is its low sensitivity to any pixel value manipulation. For example, suppose that one of the grey levels of an 8-bit image is removed due to using transform functions, then unlike the pixel-pair histogram that its effect is seen in the 255 components; only one component of histogram would be changed in the altered image. On the other hand, pixel-pair histogram is frequently employed in the steganography [17] and steganalysis [18] applications. Therefore in the proposed method we use pixel-pair histogram instead of conventional histogram to obtain better performance in comparison with previous works in the same subject. In order to generate the pixel-pair histogram of an image, we convert the image into a 1D-vector using different ordering patterns such as row ordering, column ordering, zigzag ordering, shoe ordering and rotation one. The pixel-pair histogram is an image of the size 256 × 256 (for grey-level images) where the intensity of each location (i, j) represents the number of times that the pixel pairs with the intensities i and j occurs in the generated vector. An example of the pixel-pair histogram of a typical image using column ordering pattern has been illustrated in Fig. 1. Fig. 1Open in figure viewerPowerPoint Example of pixel-pair histogram of a typical image using column ordering pattern a Typical image b 1D-vector c Pixel-pair histogram Another type of pixel-pair histogram used in this article is referred to as the binary pixel-pair histogram. To calculate this histogram, the pixel-pair histogram values greater than one are clipped to one while the values equal to zero remain unchanged. Fig. 2 demonstrates the pixel-pair histogram and its binary form for an original image and its contrast-modified counterpart. We also employ binary pixel-pair histogram to identify the contrast enhancement fingerprint as seen in Fig. 2e in an altered image. Fig. 2Open in figure viewerPowerPoint Pixel-pair histogram and its binary form a Original image b Pixel-pair histogram c Binary form of pixel-pair histogram d Altered image by changing contrast to 20 e Pixel-pair histogram f Binary form of the pixel-pair histogram g Mask used to reject noisy like points As observed from Figs. 2c and f, there exist many noisy-like points located out of the main diagonal and its neighbourhood in the binary form of the pixel-pair histogram. Hence, we use a mask to remove such noisy points in the binary form of pixel-pair histogram. The experimental results show that the performance of the SVM classifier improves when the mask is applied. The mask in fact is a white strip along the diagonal direction as shown in Fig. 2e. We have selected 100 sub-images randomly and calculated their pixel-pair histograms. Then, we used white strip with different widths to remove noisy points. Noting that the dimension of the pixel-pair histogram is 256 × 256, the results show that when the width of the white strip is selected equal to the square root of ((256/4)2 + (256/4)2), that is, 91 pixels, noisy points are well removed. 2.2 Logarithm of the absolute value of the image fast Fourier transform (FFT) (FLAF) The 2D discrete Fourier transform (DFT) of an image function f(x,y) of the size M × N pixels is defined as follows (1)In order to extract features from each pixel-pair histogram, we define FLAF of the pixel-pair histogram as (2)where fft(·) and abs(·) are the 2D-Fourier transform and the absolute value operators, respectively. That is, first the magnitude of the FFT of the pixel-pair histogram is computed to obtain information about its frequency content. Then, we use logarithm operator to reduce the range of FFT values; for example, if the absolute FFT is in the range [10−5, 105], by taking the logarithmic function, the range falls within [−5, 5]. The value one is added to the magnitude to avoid log(0). The results show that the information in FLAF can distinguish original images from the altered ones well. 2.3 Zernike moments In most of the works, statistical moments, such as Geometric moments, Hermit moments, Gaussian–Hermit moments, ZMs and so on, are used as discriminating features. In recent decades, ZMs and their family have been widely employed in different applications such as object recognition [19], image retrieval [20], edge detection [21], image coding [22], image processing and analysing [23-25] and region descriptors [6, 26, 27], because they analyse the test images without considering changes in the position, size, viewing angle and orientation. The ZM coefficients are the outputs of the expansion of an image function into a complete orthogonal set of complex basis functions. Among many moment-based shape descriptors, ZM magnitude components are rotationally invariant and are more suitable for shape description. To calculate the ZMs, the image (or region of interest) is first mapped to the unit disc using polar coordinates, where the centre of the image is the origin of the unit disc. Those pixels falling outside the unit disc are not used in the calculation. The mapping from Cartesian to the polar coordinates is (3)where ρ is the radius and θ is the angle (4)The Zernike basis function Vnm(ρ,θ) is defined in polar coordinates over a unit circle as follows (5)where n is the order and m is the repetition of Vnm(ρ,θ). Note that n and m are two non-negative integers where |m| ≤ n and n −|m| is even. Rnm(ρ) is a radial polynomial which is given by (6)The set of basis functions is orthogonal, that is (7)where For a digital image function, the 2D-ZMs in the polar coordinates are given by (8)where fp(ρ,θ) is the image function in polar coordinates. The ZMs can be viewed as the responses of the image function to a set of quadrature-pair filters. In addition, repetition m indicates sector cycles of the function values along the azimuth angle, while n and m jointly specify a different number of annular patterns of the function. 3 Proposed method Fig. 3 shows the schematic flowchart of the proposed algorithm. In the following, we explain different steps of the proposed method. Step 1: At first, the R, G and B components of each image are divided into sub-images of size 60 × 60 pixels. Then, those blocks that have higher entropy are considered for further processing; for example, the sub-images showing the sky or a smooth wall with a constant colour would be rejected for performance improvement. We use the entropy criterion defined as (9)where M × N is the size of the test sub-image and P(fi) is the probability of occurrence of the intensity fi. In our experiment, blocks with entropy values equal to or greater than six are considered for feature extraction. It is worth mentioning that by examining the entropy values of 4.5 and 5, the SVM performance has shown slight degradation. This is why the threshold is set to six. Fig. 3Open in figure viewerPowerPoint Block diagram of the proposed method Step 2: We generate our database including original, noisy, contrast-enhanced and histogram equalised sub-images. In order to compute the pixel-pair histograms of sub-images, R, G and B components of each sub-image are considered separately. Step 3: Different ordering patterns, as explained before, are exploited to calculate the pixel-pair histograms of each sub-image. We estimate the total pixel-pair histogram by peer to peer summation of the components of the calculated pixel-pair histograms. We then compute the binary form of the resulted pixel-pair histogram. Step 4: The FLAF function (the logarithm of the absolute value of the Fourier transform pixel-pair histogram and its binary form of each sub-image) using (2) is computed. Then, we convert the results from Cartesian coordinates into the polar coordinates. Step 5: As our main features, specific ZMs of FLAF function are calculated by using (8). The other feature is obtained from the distribution of absolute value of the Fourier transform of the pixel-pair histogram and its binary form. Step 6: In the final step, we apply C-SVM (standard soft-margin SVM) classifier with RBF kernel to classify the original images from their altered versions. In the following, more explanations about feature selection (step 5) will be presented. 3.1 Dataset generation and feature selection We use UCID image database [28] as unaltered or original images database. We extract R, G and B components of each original image as the grey-scale images. Next, the three components R, G and B are divided into sub-images of size 60 × 60 pixels and smooth sub-images are rejected according to our entropy criterion. We select 1000 high-entropy sub-images randomly as the original signals. We generate contrast-enhanced sub-images by using Adobe Photoshop CS5 application and change the contrast values of original sub-images to +40. Further, we use MATLAB software to add zero mean white Gaussian noise with variances 0.01 to the original sub-images. Finally, we generate the histogram-equalised sub-images via MATLAB software using the following equation (10)where H(i) is the histogram-equalised sub-image with the intensity i, h(n) is the histogram value of the input sub-image with the intensity n, and M × N is the total number of pixels in the original sub-image. Consequently, our dataset has totally 4000 sub-images (1000 from each of four categories). This set is used to find the proper features of each category which result in enough separation and high performance. As stated in step 5, one of the features employed in the proposed method is the distribution of the absolute value of Fourier transform of the pixel-pair histogram and its binary form in the polar coordinates as a function of angle in the range 0° to 180° and maximum radius which is based on the size of the pixel-pair histogram. For example, for the image of size M × N, the radius will be min{M/2, N/2}. Here, the maximum radius is set to 128. If we show the absolute value of the Fourier transform of an image by S(ρ, θ), then we calculate the distribution Sang as follows (11)where Sr(θ) is the 1D-function of S(ρ, θ) for each frequency r and Ro is the radius of the circle centred at the origin depending upon the size of pixel-pair histogram (more details are explained in [29]). In the following, we will discuss the relevant features of each class. 3.2 Specific features of contrast-enhanced sub-images In order to select features to detect contrast-enhanced images, we calculate 36 ZMs of the FLAF function (2) of the pixel-pair histogram and its binary form from the generated dataset. Then, we generate feature vectors where the first 36 elements of each feature vector are calculated from pixel-pair histogram and the next 36 elements are computed from its binary form. Therefore, our primary feature vectors have 72 dimensions. Since the values of many dimensions of the feature vectors are about 105 while the range of other dimensions is far less than it, we normalise the elements of each feature vector. Our experimental results show that the PDF of each dimension has the Gaussian distribution; therefore the following equation for normalisation can be used (12)where D(i) is the ith dimension of the feature vector D, μD(i) and δD(i) are the mean and standard deviation of D(i), respectively and Dnorm(i) is the normalised value of D(i). Finally, we apply mutual information technique to reduce the dimension of the feature vectors. For this purpose, we apply minimum redundancy-maximum relevance (mRMR) method [30] to the feature vectors. Fig. 4 shows the SVM performance against the selected dimensions by mRMR method. It is observed that the SVM performance is more than 98.5%. Considering the figure, we select the dimension 10 which achieves the accuracy 99.5%. Table 1 shows the 10 best moments indices and the equivalent ZMs of the contrast-enhanced class obtained by mRMR method. We have used the notations 'dir' and 'bin' in tables which stand for using pixel-pair histogram directly and using its binary form, respectively. Table 1. Ten best moment indices and equivalent ZMs for contrast-enhanced images Moment index 58 24 70 59 54 60 57 62 22 38 ZMs n = 6 n = 10 n = 10 n = 8 n = 5 n = 10 n = 4 n = 7 n = 6 n = 2 m = 4 m = 4 m = 8 m = 4 m = 3 m = 4 m = 4 m = 5 m = 4 m = 0 bin dir bin bin bin bin bin bin dir bin Fig. 4Open in figure viewerPowerPoint SVM performance (%) against the selected dimensions by mRMR method for contrast-enhanced sub-images The other features are obtained from the curve of contrast-enhanced sub-images (11), which is calculated by the absolute value of the Fourier transform of the binary form of pixel-pair histogram. To obtain the other features of this category, we have computed Sangs of 50 randomly selected images. Figs. 5a and b show the Sang curves of 15 original sub-images and their contrast-modified counterparts for pixel-pair histogram and its binary form, respectively. As seen in Fig. 5a, there is no difference between the Sang curves of original and altered version in the case of the pixel-pair histogram. However, from Fig. 5b, we observe that there are three noticeable peaks around θ = 0°, 90° and 180° in the binary pixel-pair histogram of the altered sub-images. Therefore, considering Fig. 5b, we define two new features. They are obtained from calculating the sum of peak values of Sang at θ = 0°, 90° and 180° and the sum of variances of Sang in the intervals [0:10], [80:100] and [170:180] degrees. Note that we cannot use the peak around θ = 45° to extract a new feature because at θ = 45°, the Sang curve has similar behaviour for the original sub-images and their contrast-modified counterparts. Fig. 5Open in figure viewerPowerPoint Plot of Sang curves a Pixel-pair histogram b Binary form of pixel-pair histogram for 15 original sub-images and their modified contrast versions As a consequence, we use 12 features for this category (ten features belong to ZMs of the pixel-pair histogram and its binary form and two features obtained from Sang of binary form of the pixel-pair histogram). These features are able to classify original sub-images from the contrast-enhanced ones. 3.3 Specific features of histogram-equalised sub-images The features extracted in part B do not classify histogram-equalised images from other classes. Thus, using the same procedure of part B, we select a new set of ZMs according to Table 2 based on the maximum class separation they provide. Our results showed that the classification performance is always 100% even with one feature. Thus, we select one dimension to distinguish histogram-equalised dataset from other classes. We did not use Sang curve to define a new feature set for this category since the ZM is sufficient to distinguish this class from the genuine one. Table 2. Ten best moment index and equivalent ZMs for histogram-equalised images Moment index 55 17 41 68 62 40 53 6 61 59 ZMs n = 7 n = 3 n = 8 n = 9 n = 7 n = 6 n = 3 n = 10 n = 5 n = 8 m = 3 m = 3 m = 0 m = 7 m = 5 m = 0 m = 3 m = 0 m = 5 m = 4 bin dir bin bin bin bin bin dir bin bin 3.4 Specific features of noisy sub-images Fabricators may add noise to remove the trace of pixel mapping transforms in their works. Moreover, the introduced features in parts A and B are not able to isolate noisy sub-images from other categories. This leads us to find a new feature set. For this purpose, we define a set of ZMs, which result in enough separation between the noisy dataset and original one. In this experiment, we added Gaussian noises with zero mean and variance 0.01 to the original images. The number of dimensions selected by mRMR method is four. Table 3 shows the four best ZMs to identify this category from the original class. Table 3. Four best moment indices and equivalent ZMs for noisy images Moment index 5 58 16 6 ZMs n = 8 n = 6 n = 10 n = 10 m = 0 m = 4 m = 2 m = 0 dir bin dir dir Another feature is obtained by computing the Fourier transform of the pixel-pair histogram in our dataset and then calculating the absolute value of the summation of the normalised real parts of its Fourier coefficients. Fig. 6 shows this feature for 100 original and noisy sub-images with variance of 0.02. Thus, we have totally five features for noisy sub-images dataset. Note that we did not use Sang curve to define a new set feature for noisy category because the original and noisy Sang curves have shown similar behaviours. Fig. 6Open in figure viewerPowerPoint Values of feature obtained from the Fourier transform of 100 original sub-images and their noisy versions for pixel-pair histogram 3.5 Final feature vector To generate the final feature vector, we extract the mentioned features mentioned in parts B, C and D for each of pixel-pair histogram and its binary form of all sub-images. Therefore, the total feature vector has 18 dimensions, where 12 features correspond to those calculated for contrast-enhanced dataset, one feature is computed as in the histogram-equalised dataset, and the last five dimensions are those used for noisy dataset. As stated, we have defined two feature sets which are based on the ZMs and features obtained from Fourier transform. It will be shown in the next section that the performance increases using these two kinds of features. 4 Experimental results In this section, we evaluate the performance of the proposed method. The dataset is generated in the same way mentioned in the previous section. However, we select 2500 of them randomly as the original dataset and thanks to Adobe Photoshop CS5 application; the contrast values of original sub-images are converted into −50, −40, −20, −10, −5, +5, +10, +20, +40, +60 and +100. Moreover, we use MATLAB software to add zero mean white Gaussian noise with variances 0.001, 0.002, 0.005, 0.01 and 0.02 to the original sub-images and contrast-enhanced ones. We also generate histogram-equalised sub-images considering (10) in MATLAB environment. Finally, we generate feature vectors of each class as mentioned in Section 3. 4.1 Classification results In order to evaluate the performance of the proposed method, we apply SVM classifier. Although SVMs are originally designed as binary classifiers, there exist various extensions to enable SVMs to handle more than two classes. Multiclass SVM classifiers can be roughly divided into two groups, all-together methods and the methods based on the binary classifiers. In this work, for multiclass classification, we use One-against-One (OvO) method which is widely used in the literature. This method employs binary SVMs for each pair of classes, where Nc is the number of classes. During the classification, the feature vectors are presented to all binary classifiers and the histogram of the outputs is evaluated. The class corresponding to the maximum value of the histogram is selected as the target class. If there are two or more classes with the same number of votes, one of the classes is randomly chosen. In multiclass SVM, we also use C-SVM with radial basis function (RBF) kernel as binary classifier bases, which is a commonly used kernel. Here, we employ LIBSVM library with RBF kernel as for kernel function where λ > 0 is the Gaussian kernel width. The developer of LIBSVM library suggested users to try the RBF kernel first and if RBF is used with model selection, then there is no need to consider the linear kernel. The kernel matrix using sigmoid may not be positive definite and in general its accuracy is n
Referência(s)