Privacy preserving search index for image databases based on SURF and order preserving encryption
2019; Institution of Engineering and Technology; Volume: 14; Issue: 5 Linguagem: Inglês
10.1049/iet-ipr.2019.0575
ISSN1751-9667
AutoresSafaa Magdy, Yasmine Abouelseoud, Mervat Mikhail,
Tópico(s)Data Management and Algorithms
ResumoIET Image ProcessingVolume 14, Issue 5 p. 874-881 Research ArticleFree Access Privacy preserving search index for image databases based on SURF and order preserving encryption Safaa Magdy, Corresponding Author Safaa Magdy safaa_mgdy@yahoo.com Faculty of Engineering, Computer Science, Alexandria University, Alexandria, EgyptSearch for more papers by this authorYasmine Abouelseoud, Yasmine Abouelseoud Faculty of Engineering, Mathematics and Physics, Alexandria University, Alexandria, EgyptSearch for more papers by this authorMervat Mikhail, Mervat Mikhail Faculty of Engineering, Mathematics and Physics, Alexandria University, Alexandria, EgyptSearch for more papers by this author Safaa Magdy, Corresponding Author Safaa Magdy safaa_mgdy@yahoo.com Faculty of Engineering, Computer Science, Alexandria University, Alexandria, EgyptSearch for more papers by this authorYasmine Abouelseoud, Yasmine Abouelseoud Faculty of Engineering, Mathematics and Physics, Alexandria University, Alexandria, EgyptSearch for more papers by this authorMervat Mikhail, Mervat Mikhail Faculty of Engineering, Mathematics and Physics, Alexandria University, Alexandria, EgyptSearch for more papers by this author First published: 04 March 2020 https://doi.org/10.1049/iet-ipr.2019.0575AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat Abstract Managing large personal image databases requires efficient privacy preserving indexing methods to allow for their outsourcing to possibly curious cloud servers. To construct a secure inverted index in this paper, first, visual words are extracted from stored images based on the Speeded-Up and Robust Features (SURF). Next, Order Preserving Encryption (OPE) is used to encipher the frequencies of occurrence of the extracted visual words. Another scale and rotation invariant feature, which is the local HSV histogram, is included for comparison. From the obtained results, it is apparent that SURF achieves more precise results. Aggregation of both features is considered to further improve the accuracy. The effects of the weighting scheme of the visual words and their number on the performance are investigated. Weighted term frequency inverse document frequency (tf-idf) together with the Jaccard similarity measure yield the best performance. OPE encryption is shown to have minor impact on the retrieval accuracy. To reduce encryption time, a lookup table is constructed. The inverted index reduces the search time significantly compared to a sequential search scheme as apparent from the results. A comparative study with recent related schemes demonstrates the competitiveness of the implemented system in terms of computational efficiency and accuracy. 1 Introduction Index structures are commonly used in text documents retrieval systems to avoid slow sequential search. In its simplest form, the index structure consists of keywords (collectively referred to as a dictionary) associated with pointers to the documents in which they are included. When a query is presented, the documents that share the largest number of keywords with the query are considered to be relevant and are returned in response to the query. A similar concept has been applied successfully to image retrieval systems. In [1], the authors annotated descriptive keywords to an image in their system. However, such systems are less common. In the more common content-based image retrieval (CBIR) systems, instead of keywords, image patches or key points descriptors – called visual words – are generated from a sample of images in the database. A suitable clustering algorithm is then employed to quantise the extracted image features into centroids of clusters constituting what is referred to as bag of visual words (BOVW). This quantisation step is essential to reduce the number of extracted visual words and thus increase the efficiency of the search process. The number of occurrences of each visual word in an image is counted and this results in a histogram for each image. Suzuki et al. [2] introduced an image retrieval scheme that employs the term frequency-inverse document frequency (tf-idf) weighting method. They considered rare areas in images, in contrast to the scheme implemented in [3] that ignores the rare regions and focuses only on most frequent parts of an image during features extraction. They used Earth mover's distance (EMD) measure and the cosine measure in the matching process, where EMD is defined as the minimum cost incurred to transform the histogram of one image into the histogram of the other [4]. Multiple term or soft assignment in the quantisation step is shown to achieve improved accuracy in [5]. Further enhancements to the BOVW model are introduced in [6] using the n-BOVW model and binary-based compression. The authors used the distance from a key point to a visual word to get the best match between a key point of an image and representative visual words in the quantisation step. Post-query processing is also used for accuracy enhancement. In their rank-aggregation approach, the query is performed several times using a separate BOVW each time. Then, the results are aggregated. This approach requires more space for storing multiple indexes and consumes more search time [7]. Exhaustive pairwise image matching using connectivity graphs is implemented in [8]. However, a more efficient approach for image pairs similarity assessment is developed by Lou et al. [9] based on tf-idf with the aid of a BOVW model. More accurate results are achieved in [10], by considering the intersection of the matching candidate images resulting from the traditional Jaccard similarity measure and the tf-idf weighted one. Resource constrained devices need to securely outsource their multimedia data to powerful servers. However, they need to ensure the privacy or confidentiality of their data, while enabling search by the remote servers. Both the images and the index structure should be encrypted. Traditional encryption schemes that ensure confidentiality prohibit search capabilities except by the data owner and thus the remote server can no longer answer search queries over the encrypted index without knowledge of the secret keys involved in the decryption process. Lu et al. introduced a privacy preserving search technique over an inverted index based on order preserving encryption (OPE) in [11, 12]. OPE finds application in wide column store databases and thus efficient implementations are available that suit real-time applications [13]. In their construction of the index, Lu et al. used the local HSV (hue-saturation-value) histogram of small image patches as a descriptive feature and they employed Jaccard similarity measure in the matching process. In [14], a scaling factor related to inverse document frequency is used in weighting visual words frequencies. It skews the frequency values in a way that visual words having less common or more distinctive occurrence are assigned higher weights. This scaling factor of a visual word depends on the total number of images in the database and the number of images containing that visual word. This approach enhances the performance by increasing the precision. On the other hand, semi-homomorphic encryption is used to encrypt the tf-idf weighted frequency in [10]. Another encryption technique that allows to answer queries and perform equality check operations on encrypted data is the so-called asymmetric scalar-product preserving encryption (ASPE) scheme [15]. Liang et al. proposed a CBIR system based on a balanced binary tree of visual words (vocabulary tree) to accelerate the search. They used ASPE for encryption [16]. In their approach, they used a linear combination of the HSV histogram and the discrete cosine transform (DCT) histogram as image descriptors in constructing the vocabulary tree. In [17], the authors employed EMD as a complicated and accurate measure to improve the performance of the search. They used SIFT local features and local sensitive hashing (LSH) in pre-filtering of images to increase efficiency. In [18], another secure CBIR system of multi-users in IoT-cloud is introduced. Their scheme is lightweight such that it can be employed in smart devices. They represent images in terms of SURF descriptors and used LSH again to encrypt the indexes. Though the building blocks of the proposed method are not new, however, the combination considered succeeded in yielding improved results compared to those already available in literature. Our contributions in this paper can be summarised as follows: • We extend the work in [12] by considering the more discriminative SURF feature descriptor instead of the local HSV histogram in the construction of the BOVW. We demonstrate the superiority of the SURF-based search index over the HSV histogram based index in terms of the accuracy of the retrieved images. Moreover, the k-means++ algorithm is used instead of the hierarchical k-means clustering algorithm, where the latter suffers from less flexibility compared to the former. • Enhancements to the implementation of OPE of the frequencies of the visual words are achieved through the use of a look-up table and ignoring too large and too small frequency values. • Experiments using different weighting schemes of visual words in the matching process are conducted and recommendations are thus provided. • Elaborating on the effect of varying the size of the BOVW on both the search time and the accuracy of the retrieved results. • Comparing the performance of a sequential search scheme to the proposed inverted index scheme and savings in the search time are discussed. Moreover, the tradeoff between precision and speed of search is demonstrated for both alternatives. • Parallel aggregation of the results of more than one BOVW is investigated and its impact on system performance is examined. • A comparative study between the proposed privacy preserving search scheme and existing schemes in literature is conducted. The rest of the paper is organised as follows. The necessary background is provided in Sections 2 and 3. In Section 2, the procedure of building an inverted index for an image database is reviewed together with the description of the features used in this paper. Order preserving encryption (OPE) is described in Section 3 and its security related issues are examined. In Section 4, our secure CBIR system is presented. In this section, the parties involved and the interaction between them are described in detail. Moreover, the similarity measures employed are defined. Section 5 includes the parameters settings used in our simulations to evaluate the performance of our system. In Section 6, the results obtained are summarised, powered by a comparative study with recent papers in literature. Finally, we conclude our work in Section 7. 2 Inverted index construction In this section, the three steps involved in the construction of an inverted index structure for efficient image retrieval are described. These steps are (i) features extraction, (ii) quantisation of feature vectors to generate a BOVW model and (iii) counting the frequencies of occurrence of each visual word as well as linking each visual with the containing image files [7]. 2.1 Features extraction Local features are extracted from the images either by dividing the image into small patches using a grid or by identifying key points (interest points). Next, feature descriptors are assigned to each patch or key point. Two kinds of features are considered in this paper. The first one is the local HSV histogram computed for small image patches, for the sake of comparison with the results in [12]. The second is the speeded-up and robust features (SURF), which is a local feature detector/descriptor that is relatively fast and more robust compared to other scale invariant feature detectors/descriptors like SIFT [19]. Interest points are identified and then a unique descriptor is assigned to each point. The major stages involved in the extraction process are scale-space extrema detection, key point localisation, orientation assignment and key point description. Interest areas around key points must be specified. Each interest area is divided into sub-areas and described by the values of wavelet responses. The SURF descriptor is based on sums of 2D Haar wavelet responses [20]. 2.2 Bag of visual words Quantisation of features allow to gather visually similar features together, which in turn enhances the performance of the system [21]. Quantisation is achieved by clustering the feature vectors (descriptors), associated with image patches or interest points, obtained in the previous step from a sample of the database images. The centroids of the clusters are referred to as visual words. The set of centroids is collectively referred to as a BOVW. The choice of the clustering algorithm has a significant impact on the performance of the system in terms of computational efficiency and retrieval accuracy. Clustering algorithms can be classified into hierarchical and partitioning algorithms. Hierarchical algorithms construct clusters by recursively partitioning the space of the feature vectors. The major disadvantage of these algorithms is that no back-tracking ability is available such that no correction can be made to the centroids and clusters after they are created in each step. On the other hand, partitioning algorithms depend on relocating points by moving them between clusters. The time complexity is linear with the dataset size. k-means and k-means++ are examples of partitioning algorithms. k-means clustering algorithm has a problem of convergence to a local minimum as it is sensitive to the initial seed set. On the other hand, k-means++ clustering algorithm has an advantage of making a careful choice of the initial centroids of the clusters [22]. Any clustering algorithm employs a distance metric in order to assign a feature vector to a proper cluster. The distance between the centroid of a cluster and the feature vector is used to include a feature vector or exclude it from the cluster. There are two types of feature vector assignments in the quantisation step, hard and soft assignments. In hard assignment, a feature vector is assigned to one visual word. On the contrary, in the soft assignment, each feature vector can be assigned to l nearest visual words. It has been shown that if , no additional information is gained [23]. Soft assignment has the advantage that the quantisation error is smaller compared to hard assignment. Jégou et al. showed that multiple assignment causes a modest increase in the accuracy of retrieval at the cost of longer search time [24]. 2.3 Building the search index An inverted index is a tool that is widely used in information retrieval in order to reduce the search time. The search time becomes independent of the database size. This tool has been successfully adopted in retrieval of text documents as in [25]. For each image file in the database, extract its visual features and quantise each of them to the closest visual word. The inverted index consists of the visual words in the BOVW and pointers to the image files containing such a visual word together with its frequency of occurrence in each of them as depicted in Fig. 1. Fig. 1Open in figure viewerPowerPoint Inverted index structure An index structure is a more compact representation of an image database compared to visual features representation, where each image is described by a high-dimensional vector like HSV histograms and SURF features. Comparing pairs of visual features is very time consuming. For instance, the SURF feature involves about 300 key points and each is described using 64 components. Thus, at query time, calculating the distance between such a huge vector and the corresponding ones for all images in the database consumes a large amount of time. On the contrary, a visual word is a 64D vector in case of using SURF descriptor and quantisation of the extracted feature vectors involves calculating the distance between the set of visual words and the descriptors of the interest points. The matching process thus reduces to finding the image files that share as many visual words as possible with the query image. It is noteworthy that the number of visual words is an adjustable parameter that can be tuned to achieve the best performance in terms of the accuracy of retrieval, the search time and the quantisation time. 3 Order preserving encryption To ensure the privacy of users, the images in the database should be encrypted before sending to the hosting server. Moreover, the frequencies of the visual words stored in the inverted index need also to be encrypted before transmission. The frequencies of occurrence of the different visual words reveal significant information about the image contents. However, traditional encryption techniques do not allow searching without decrypting the data, which in turn violates the confidentiality requirement we are concerned about in this paper. OPE is one well-known solution to this problem [26]. OPE is a deterministic encryption scheme which preserves the numerical order of the plaintexts in the ciphertext domain [27]. It has been successfully adopted to answer range queries over encrypted relational databases. Moreover, in [14], OPE is used to obscure frequencies of visual words in the index structure. OPE aims to make the distribution of encrypted frequency values close to a uniform distribution to reduce information leakage to the server. Mathematically, OPE has the property that for two values x and y, if , then after applying encryption , the order relation is still valid such that . To satisfy this property, each frequency value w is mapped to an integer value uniformly selected from the interval [L,M]. During the insertion or encryption step of the frequencies one after the other, it may happen that there is no gap between the ciphertexts to allow the insertion of a new frequency value in its proper order. In this case, an update procedure is invoked to relocate the ciphertexts to allow the insertion of the new frequency value. The proper selection of the maximum ciphertext value (M) can help reduce the update probability. For a plaintext space of size , a ciphertext space of size is recommended, where is an appropriate choice as verified experimentally in [27]. Using OPE can prevent adversarial attacks that exploit statistical information of visual word frequency values. On the other hand, encryption with OPE may reveal some information about variance of plaintext items [28]. Thus, it may be considered a light-weight encryption scheme. Additional security guarantees can be provided by encrypting the visual words using a suitable distance preserving scheme and thus the information leaked is rendered useless. For more sensitive data, public key encryption mechanisms can be employed instead. However, public key encryption is known to be more computationally intensive compared to OPE as demonstrated in the Pallier-based scheme examined in [12]. Moreover, using a public key cyptosystem in a privacy preserving retrieval system calls for the interaction between the server and the data owner to complete the matching process. 4 Our secure CBIR scheme 4.1 System model In general, the model for a privacy preserving image retrieval system involves three parties: the data owner, the server and the data user. The data owner creates the images database and the associated search index. It is the data owner who encrypts the images and the visual words frequencies before uploading them to the server. The data users submit encrypted queries to the server. The server, in turn, retrieves relevant images to the submitted query. These relevant images are those showing greatest similarity to the submitted query in the ciphertext domain. In this paper, we consider a simplified model. In our model, it is assumed that the data user is the data owner itself. Another assumption in our system model is that the server is honest but curious. Our secure CBIR scheme proceeds in two phases: an off-line preparation phase and an on-line searching phase. The off-line phase takes place on the content owner's side and its output is sent out to the remote server. On the other hand, the on-line phase is an interactive phase, where the owner submits its encrypted query to the server. The server in turn calculates an appropriate similarity measure between the encrypted query and the images that share visual words with it. The top ranked images are returned in response to the query. The owner finally decrypts the retrieved images. The two phases are described in Algorithms 1 and 2 (see Figs. 2 and 3). Fig. 2Open in figure viewerPowerPoint Algorithm 1: off-line phase Fig. 3Open in figure viewerPowerPoint Algorithm 2: on-line phase In our scheme, SURF is employed in the feature extraction step and OPE encryption is used to achieve confidentiality of the remotely stored inverted index. 4.2 Similarity measures Clustering involved in BOVW creation and term assignment step at query time require choosing a distance measure to assess how similar two feature vectors are. Manhattan distance (), Euclidean distance () and Mahalanobis distance are widely used. In this paper, the Euclidean distance is used. Another distance measure is also needed in the query phase for measuring the similarity between a query image and the images stored in the database according to the BOVW model [7]. To calculate the similarity between images, visual words frequencies are used. Three weighting schemes for the visual words are considered in our experiments. • Boolean frequencies: These are obtained simply by replacing the frequency of occurrence of a visual word by 1, if it is and otherwise by 0. This technique causes a drop in the accuracy, as it just reflects the existence or absence of a visual word in an image as will be demonstrated in the results section. • Ordinary frequencies: The frequency of occurrence of a visual word generated according to the BOVW model. • tf-idf [10]: It is a statistical measure that reflects the relevance score of a visual word. The standard tf-idf weight of a visual word is computed as follows: (1)where is the number of occurrences of a visual word i in the image d, denotes the number of visual words included in image d, N refers to the total number of images in the database and indicates the number of images that contain the visual word i in the images database. Thus, mathematically, we can express the weight of a visual word V in an image I, for binary, ordinary and tf-idf schemes as in (2)–(4), respectively (2) (3) (4)In terms of the above weighting schemes, the Jaccard similarity measure is computed as (5)where v is the number of visual words in the BOVW. Q and D are the query image and stored image, respectively. is the encrypted value of w. According to Jaccard's measure, the similarity is preserved in the ciphertext domain. This is because the used encryption module is order preserving. 5 Implementation details Our privacy-preserving indexing scheme is tested on a database of 1000 variable size images that includes 10 categories: african, architectures, beach, buses, dinosaurs, elephants, flowers, food, horses and mountains [29]. The block width used in SURF from which the descriptors are extracted is selected to be 32, 64, 96, 128 pixels for the different scales. In case of using the local HSV histograms in creating the BOVW, we extracted feature vectors from image patches. In the clustering step, a training set of 200 images (20% of 1000 images) is used. The initial number of extracted SURF descriptors for interest points is 319,577, which is reduced to 96,170 features after elimination of weak interest points. The number of clusters k is taken to be 1000, where the centroids of these clusters have converged and have become stable after 23 iterations. The criterion for stabilisation is that there is no significant difference between each two succeeding iterated centroids. The threshold value used for the total distance between the centroids of two consecutive iterations was set to be . When the total distance drops below this value, the clusters were considered to have become stable. The maximum number of iterations was taken to be 100. To avoid storing too large numbers, the visual words are normalised with respect to the largest frequency value occurring in the set. The ‘word frequency range’ is taken to be between 0.01 and 0.9; that is, visual words to be included in the final BOVW must have relative frequencies falling in this range. The purpose of this restriction is to ignore very common words or rare ones. The words of very high frequencies are constructed from repeated patterns in an image. We used a ‘lookup table’ to store the encrypted values of the distinct frequencies of visual words (within the acceptable range) appearing in all images in the database. The number of elements in this lookup table is much smaller than the original number of visual word frequency-image pairs. This lookup table is constructed after the BOVW is extracted in the setup phase and thus is used to save time of encryption. It can also be used during the on-line query phase. If the query image is included in the database, then all related frequencies encrypted values can be readily fetched from the table. On the other hand, if the query image does not belong to the database, the OPE routine is called to encrypt the frequencies of the visual words not included in the lookup table. In the OPE routine, we set the ciphertext range to be between and to avoid collisions between ciphertext values that cause the time-consuming re-encryption procedure. This encryption module consumes about s for a value to be encrypted. In case of using SURF, the visual word-image pairs with non-zero frequencies are only . Collecting the distinct frequencies, by eliminating the repeated values, we end-up with 4651 unique frequencies that need 2790 s for encryption of all dataset frequencies. Similarly, using the local HSV histogram feature, there are only 7159 unique frequencies that need encryption. 6 Results The implementation of our secure CBIR scheme is done in MATLAB 2017b using a computer with Intel Core i3 Processor M380 (2.53 GHz) and 3 GB RAM. The OPE module is implemented in Java (using NetBeans IDE V.8.0.2), which is called from the MATLAB environment after generating the inverted index. We used a sample of five images randomly selected from each category as query images. A hit or a match occurs when the retrieved image is of the same category as that of the query image. This image is referred to as a positive image. The average accuracy of the retrieval system is measured in terms of the precision and recall, where the precision and recall are defined as (6) (7) 6.1 Performance evaluation of our scheme Figs. 4a and b show the top-ranked 20 images retrieved in response to a query using a SURF-based tf-idf weighted index and Jaccard similarity before and after encryption, respectively. It is remarkable that these candidates sets are approximately the same, and all of the retrieved images are of the same category as that of the original query image. This means that our secure CBIR scheme achieves high precision in both the plaintext and ciphertext domains. On the other hand, visually comparing Figs. 4b and c reveals that only 17 images of the top-ranked 20 images are of the same category as that of the query image in case of using the local HSV histogram in constructing the inverted index. Consequently, it is clear that the HSV-based tf-idf weighted index has slightly less precision compared to the SURF-based index. Fig. 4Open in figure viewerPowerPoint Query results of (a) SURF-based tf-idf weighted index before encryption, (b) SURF-based tf-idf weighted index after encryption, (c) HSV-based tf-idf weighted index in encrypted domain, the left most image is the query image In Fig. 5, we show the average precision and recall values per category for a SURF-based search index in the ciphertext domain. The threshold value for the normalised weighted Jaccard similarity measure is taken to be in order to decide whether the database image is relevant to the query one or not. In this figure, we aim to investigate the performance of our scheme for the different categories in the database under study. It is noteworthy that the ‘flowers’ category achieved the highest average precision (0.27), while other categories have precision values around . This may be attributed to the nature of the images falling in this category having repetitive patterns that make the use of the BOVW model most suitable for handling them. In context of recall, most categories achieved high recall values (with an average recall equal to 0.9086). Fig. 5Open in figure viewerPowerPoint Average precision and average recall performance per category for SURF-based index using weighted frequencies, OPE and Jaccard similarity measure Figs. 6a and b illustrate the accuracy of the search results based on using the three different weighting schemes of the frequencies of visual words: boolean, ordinary and tf-idf weighted frequencies before and after encryption, respectivel
Referência(s)