Artigo Revisado por pares

Method for using visible ocular vasculature for mobile biometrics

2015; Institution of Engineering and Technology; Volume: 5; Issue: 1 Linguagem: Inglês

10.1049/iet-bmt.2014.0059

ISSN

2047-4946

Autores

Vikas Gottemukkula, Sashi K. Saripalle, Sriram Pavan Tankasala, Reza Derakhshani,

Tópico(s)

Biometric Identification and Security

Resumo

IET BiometricsVolume 5, Issue 1 p. 3-12 ArticleFree Access Method for using visible ocular vasculature for mobile biometrics Vikas Gottemukkula, Vikas Gottemukkula EyeVerify, 1911 B West 45th Ave, Kansas City, USASearch for more papers by this authorSashi Saripalle, Corresponding Author Sashi Saripalle sashikanth.s@gmail.com Department of Computer Science and Electrical Engineering, University of Missouri at Kansas City, 5110 Rockhill Road, Kansas City, USASearch for more papers by this authorSriram P. Tankasala, Sriram P. Tankasala Department of Computer Science and Electrical Engineering, University of Missouri at Kansas City, 5110 Rockhill Road, Kansas City, USASearch for more papers by this authorReza Derakhshani, Reza Derakhshani Department of Computer Science and Electrical Engineering, University of Missouri at Kansas City, 5110 Rockhill Road, Kansas City, USASearch for more papers by this author Vikas Gottemukkula, Vikas Gottemukkula EyeVerify, 1911 B West 45th Ave, Kansas City, USASearch for more papers by this authorSashi Saripalle, Corresponding Author Sashi Saripalle sashikanth.s@gmail.com Department of Computer Science and Electrical Engineering, University of Missouri at Kansas City, 5110 Rockhill Road, Kansas City, USASearch for more papers by this authorSriram P. Tankasala, Sriram P. Tankasala Department of Computer Science and Electrical Engineering, University of Missouri at Kansas City, 5110 Rockhill Road, Kansas City, USASearch for more papers by this authorReza Derakhshani, Reza Derakhshani Department of Computer Science and Electrical Engineering, University of Missouri at Kansas City, 5110 Rockhill Road, Kansas City, USASearch for more papers by this author First published: 01 March 2016 https://doi.org/10.1049/iet-bmt.2014.0059Citations: 13AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract Securing personal information on handheld devices, especially smartphones, has gained a significant interest in recent years. Yet, most of the popular biometric modalities require additional hardware. To overcome this difficulty, the authors propose utilising the existing visible light cameras in mobile devices. Leveraging visible vascular patterns on whites of the eye, they develop a method for biometric authentication suitable for smartphones. They start their process by imaging and segmenting whites of the eyes, followed by image quality assessment. The authors' stage 1 matcher is a three-step process that entails extracting interest points [Harris–Stephens, features from accelerated segment test, and speeded up robust features (SURF)], building features (SURF and fast retina keypoint) around those points, and match score generation using random sample consensus-based registration. Stage 2 matcher uses registered Gabor phase filtered images to generate orientation of local binary pattern features for its correlation-based match metric. A fusion of stage 1 and stage 2 match scores is calculated for the final decision. Using a dataset of 226 users, the authors' results show equal error rates as low as 0.04% for long-term verification tests. The success of their framework is further validated on UBIRIS v1 database. 1 Introduction Today's smartphones are powerful mobile computing platforms with an array of different sensors. Out of 425 million phones sold worldwide in the first quarter of 2013, 49.32% were smartphones, compared with 34.76% of 422 million phones sold during the same period in 2012 [1]. Unlike traditional phones, smartphones are used for banking, e-mail, entertainment, gaming, health-related applications, photography, e-commerce, social networking, and many more; to the point that they are replacing PCs in many of the aforesaid application areas [2–4]. Given the above, there is growing need for securing smartphones. The prevalent password-based identity solution (something you know), or those based on possession of physical tokens (something you have) are all vulnerable to a variety of attack vectors. Remembering an ever-changing list of multiple complex passwords is impractical, leading to a broken authentication model that frequently results in mass security breaches. Moreover, entering passwords on small form factors and touch screens is challenging and time consuming. Biometrics, on the other hand, is the most natural method of authentication because it is based on who you actually are. Biometric modalities are based on matching features derived from unique, stable, universal, and conveniently acquired biological traits of an individual [5]. In this work, we provide a convenient biometric solution that leverages smartphones' existing hardware to scan user's white of the eyes and match its visible vascular patterns. The majority of today's biometric market is focused on two modalities: fingerprints, which are among the oldest and iris, which is often quoted to be one of the most accurate among all other biometric modalities [5, 6]. Gait, voice, face, retinal vasculature, vasculature seen on the white of eyes, and hand geometry are examples of other existing or emerging biometric authentication methods [5–10]. Typically, optical or capacitive sensors are used for capturing fingerprints, near infrared cameras for scanning the iris, non-mydriatic three CCD cameras for imaging retinal vasculature, near infrared sub-dermal scanners for hand and finger veins, and regular digital cameras, such as those found in smartphones, for face and vascular patterns seen on the white of the eye [6, 9]. Among all the aforementioned biometric modalities, only voice, face, and eye surface vasculature can readily work with the sensors already found in all smartphones, namely microphones and visible light cameras. However, face and voice recognition do not have the consistency and accuracy required for many applications [5]. Furthermore, during many smartphone biometric use cases, face recognition is adversely affected by lighting conditions, and voice recognition is not considered discrete given the need to utter phrases for recognition. Iris recognition in visible spectrum is mostly ruled out considering the challenges of imaging dark irides without specialised infrared cameras and light sources [11, 12]. Thus, our biometric of choice for smartphones is narrowed down to vasculature seen on white of eye, henceforth referred to as 'eyeprints'. Derakhshani et al. introduced this new biometric modality in 2006 [10, 13]. Early experiments showed that whites of the eye provide the same rich, dense, and unique vascular patterns found in other vascular biometrics [6]. Our studies show that this is true even when using the available built-in cameras of a smartphone (e.g. see Fig. 6). Thus, we expect eyeprints to achieve the same level of robustness, accuracy, and security provided by other vascular biometric modalities, albeit with a scanner that is already available to a significant and ever increasing number of users across the globe, by virtue of smartphones' built in cameras. As described in [5], a biometric trait needs to satisfy universality, distinctiveness, permanence, collectability, performance, acceptability, and non-circumvention criteria. In [13], Derakhshani et al. describe how eyeprints satisfy these properties. Distinct and consistent patterns seen on white of eye are mostly confined to the blood vessels present in episclera and especially conjunctival layers of the eye. Sclera, the mostly white fibrous layer of the eye, is a thin opaque avascular body, which sometimes is mistaken as the structure embodying the visible vasculature seen on white of the eyes [14, 15]. However, since most of the visible vasculature in white of eye is usually attributed to the conjunctival layer, this emerging modality has also been referred to as conjunctival vasculature biometrics. Since its introduction by Derakhshani and Ross, other researches have been introducing various segmentation, image enhancement, feature extraction, and classification algorithms for conjunctival vasculature biometrics. So far, eyeprint biometric studies have been performed either by utilising UBIRIS v1 RGB dataset, or researchers' in-house ocular images [12, 13, 16–29]. Some studies have also reported using multispectral captures [30, 31]. Given the mobile platform focus of this study, we used a dataset collected by iPhone 5 (Apple® Inc. CA). The data collection app uses a Viola Jones-based eye detection algorithm to find the ocular region. Next, the app automatically sets the focus and white balance metering points on white of the eye before capturing and cropping the ocular region of interest (ROI). The first contribution of this work is a robust two-stage classifier that solves both registration and vein-matching issues faced when a mobile phone is used to capture the eyeprints. Our other contributions include a robust intensity- and morphological-based scleral segmentation method, and a no-reference image quality metric that measures vascularity of the eyeprint (see Fig. 1 for a complete verification flow diagram). The rest of this paper is organised as follows. Section 2 summarises the related work. Section 3 describes the data collection procedure. Section 4 describes image quality metric algorithms, scleral image segmentation, image pre-processing, feature extraction, and matching methods. Results are reported in Section 5, and the conclusions and future work are given in Section 6. Figure 1Open in figure viewerPowerPoint Verification procedure 2 Related work Visible ocular vasculature has been reported in the literature under different designations, namely, conjunctival vasculature, ocular surface vasculature, sclera, whites of the eye, and most recently, eyeprint. The initial work on eyeprint recognition used minutia-based template matching on a six subject in-house database [13] acquired using a Sony CyberShot® camera. An in-house database, acquired using a Canon 20D dSLR camera, was used in [16] to classify wavelet decomposition features. The initial enhancement and registration schemes were reported in [17] using a 50 subject in-house database. The performance of gray-level co-occurence matrix (GLCM) [18] and pyramidal directional filter [19] features were investigated using dSLR captures. In related work [21–29], several methods were evaluated on UBIRISv1 database [20] compared with in-house datasets. In [30, 31], authors examine eyeprint recognition captured by a multispectral camera. To the best of our knowledge, this is the first study to report on eyeprint authentication using images acquired by a smartphone. 3 Data acquisition To maintain data collection integrity, the study data was independently collected by the Biometric, Standards, Performance and Assurance (BSPA) Laboratory at Purdue University. The data was acquired with the rear-facing camera of an iPhone 5 in an indoor environment using built in light-emitting diode torch lighting from a four to eight inch distance. Two hundred and fifty participants were invited for two sessions. The sessions were 7–20 days apart. All the subjects were instructed how to use mobile phone's data collection application by BSPA. Each session included two to three 'runs'. In each run, the participants were asked to look up and to their left for 2–3 s, and then look up and to their right for another 2–3 s. The phone delivered the related voice and haptic prompts. Glancing exposes more of the white of the eye, and thus facilitates the ensuing vein pattern matching stages. During each run, the phone captured four to five images, refocusing and re-metering every time. We retained the best images using a quality metric that will be explained in Section 4.2. Henceforth, we will refer to the four eyeprints of each participant as left eye looking left (LL), left eye looking right (LR), right eye looking left (RL), and right eye looking right (RR), as shown in Fig. 2. Figure 2Open in figure viewerPowerPoint Left – cropped eye images from a cellular phone camera; right – typical segmented white of the eye for the respective images, including a failed segmentation for the third capture As stated earlier, 250 participants were initially scanned. Owing to irrecoverable imaging errors due to a bug in the data collection application or operator errors, eventually we could process records from 235 participants that had at least one run from each of their data collection sessions. 4 Image pre-processing and vein pattern recognition Our goal was to design robust, accurate, and consistent image segmentation, enhancement, feature extraction, and feature matching routines that at the same time were computationally light for near real-time operations on a smartphone. The aforesaid processes should also be resilient towards imaging aberration typical of mobile device use cases. Our outline of the entire procedure is given in Fig. 1 and further described in this section. The green layer of an RGB image is considered to be the optimal imaging channel for extracting vascular structures seen in white of the eyes [32]. However, for segmentation, we used the blue layer of the RGB image given its better skin-sclera contrast. All the methods described hereafter are initially designed on Matlab. 4.1 Segmentation As an initial step, we extract scleral ROI from the cropped whole eye image. Robust and accurate segmentation of the scleral ROI is crucial, as we need to handle variations in the delivered eye crops, lighting conditions, skin tones, gaze angles, droopy eyelids, eyelash occlusions, glare, specular reflections, shadows, and make-up. Inclusion of extra regions such as skin, eyelids, and eyelashes, or the exclusion of vasculature in the whites of eye during segmentation will adversely affect the accuracy of the whole system. We use the blue channel (IB) of the RGB images for segmentation. The following steps describe the segmentation routine (Fig. 3). 4.1.1 Median filtering Since we want to delineate the white of the eye, the first step in segmentation is to reduce the imprints left by the vasculature and the eyelashes, because they could be mistaken as eyelid boundaries or otherwise non-scleral regions. Given the fine spread of vascular and eyelash areas compared with the larger scleral background, their darker foreground pixels can be removed by two-dimensional (2D) median filtering. We perform a 5 × 5 median filtering on the blue channel IB to derive IMedian. 4.1.2 Light boundary suppression Leveraging edges, we intend to suppress the boundaries of lighter structures. Thus, a morphological grey scale reconstruction followed by image subtraction is performed on IMedian [33]. Reconstructing IMedian from marker J is formulated as (1) (2)where δ is the geodesic dilation of J inside IMedian at a geodesic distance of n. ρ and ILBS are the reconstructed output and the final image with suppressed boundaries, respectively. The reconstruction method is an iterative process that converges when pixel variations are stabilised. A fast hybrid version of image reconstruction proposed by Vincent [33] is used in our work. 4.1.3 ROI extraction In most cases, the regions around the lower eyelid form a low-contrast boundary against whites of the eye. Hence, a weak link between suppressed and non-suppressed regions is observed in ILBS. To eliminate these weak regions, we first preform morphological erosion followed by dilation () on ILBS using the structural elements B1 and B2 (3) (4)where B1 and B2 are the circular disks of radius 5. Next, we retain all the values of J2 above a certain threshold to generate the binary mask (IBinary = J2 > Threshold). IBinary can also be represented as the collection of several connected components . Assuming that the largest area pertains to whites of the eye, we discard the rest to estimate the initial mask of whites of the eye (IMask). 4.1.4 Post processing A convex hull is used to smoothen the IMask and the estimated iris from IMask is removed (IPP). Next, two curves are fitted to partial upper and lower boundaries to estimate the eye corner. The IPP extended to the eye corner is our final ROI. 4.2 Image quality metric The next step is to select a suitable eyeprint image for classification from a given burst of captures. The data collection application acquires multiple images in a quick succession, re-adjusting focus, exposure, and white balance during each capture. However, the phone's camera controls and metering are not optimised for capturing the best eyeprints. Thus, we designed a no-reference quality metric that is sensitive to vascularity of an image, that is, the extent of well-defined vascular patterns present in a given image, in a way that benefits the ensuing matcher. To design the quality metric, we used a database of 30 subjects that were not a part of this study's 235 subjects. These subjects were captured using the same application that was used to capture 235 subjects. Nine hundred and seventy eight samples from these 30 subjects were used for training the quality metric score and 467 samples were used to validate the performance of the metric. Relatively, these images are more challenging than the 235-subject database. On the validation set, it is seen that there is a 50% improvement in equal error rate (EER) with the use of quality metric. The following steps describe our eyeprint image quality metric algorithm: (i) Generate the sum of thresholded phase of Gabor image as described in Section 4.3.2.1, which highlights the vein-like patterns of the ROI. After binarising the derived image, a first measure of vascularity (VG) is calculated as the ratio of number of vascular pixels to the number of total pixels within the segmented ROI [34]. (ii) Calculate fractal dimension (FD) of the ROI using box-counting technique [35] from binarised image of step (i). FD of an image is related to the minimum number of boxes required to cover the binarised image and has experimentally been shown to correlate with the level of details in vein-like patterns [35]. (iii) The count of speeded up robust feature interest points (SQ) [36], as described in Section 4.3.1.1. We observed that this number correlates with density of mostly vascular patterns seen in the ROI, and thus it is taken as another partial evidence of vascularity. (iv) A general quality of image (IQ) measure in terms of JPEG blocking and blurring artefacts using [37]. Note that this is the only metric that is calculated over the whole cropped eye and not just within the ROI (segmented scleral region with vascular arcades). (v) Finally, all these measures were used as independent variables to create a function using multiple linear regressions, with the desired goal (dependent variable) being prediction of the ensuing match score of genuine claimant comparisons (over 978 training samples altogether). Final quality score is calculated by the following formula (5) Using the above eyeprint quality metric, all the captured images with a score of 1.75 or higher were retained for matching purposes (an experimentally set threshold). During any single image capture sequence for LL, LR, RL, or RR ROIs, if more than two images surpassed this threshold, we retained the best two. It should also be noted that given the diversity of vascular patterns, the variable and non-ideal image acquisition, and the complex and non-linear pre-processing, feature extraction, and matching, the aforementioned quality check is an initial pre-screening routine and not an accurate predictor of the final match score. In that sense, the importance of the aforementioned quality metric is mostly notable during (unsupervised) enrolment process where an initial selection of quality captures is critical. 4.3 Matching In general, vasculature seen on white of the eyes has high local edge and orientation information. However, these characteristics are somewhat elastic and fluid due to stretching and sliding of conjunctival tissue. In part this could be due to variable force vectors resulting from different gaze angles and the fact that conjunctival tissue is mechanically better anchored closer to the corneal limbus [14]. Furthermore, when using a smartphone as the eyeprint scanner, sensor noise, improper lighting, variable viewpoints, and other imaging artefacts such as glare, specular reflection, shadow, motion blur, defocus, and segmentation errors create challenging image capture scenarios. Imperfect camera metering in terms of exposure and white balance becomes an issue when considering that eyelids, eyelashes, and variable skin tones surround the ROI. Whites of the eye might also have other non-permanent manifestations such as conjunctivitis and blepharitis [38]. Our choice of feature extraction and matching process was designed to address the above challenges. For this, we built a two-stage matcher. Stage 1 generates match score 1, and a geometric transformation matrix that registers verification image to enrolment template for stage 2 matcher. Stage 2 performs a global matching to generate match score 2, to be combined with the first match score for a final authentication decision. As an initial step, all the segmented images are cropped to the bounding box of the scleral mask plus a 20-pixel padding (to accommodate for stage 2 image filtering process). These images are further resized to 441 × 541 pixels, in order to have a standard template size for stage 2. This size was experimentally determined based on the average dimensions of ROIs in our dataset, captured by an 8-megapixel iPhone 5 camera placed at four to eight inches away from the tip of user's nose. 4.3.1 Stage 1 matcher Inspired by visual odometery [39], stage 1 matcher templates were designed with multiple local image descriptors built around salient interest points. Next, a nearest neighbour descriptor matcher [39], followed by random sample consensus (RANSAC) outlier detection, provides both inlier locations and a registration matrix between stage 1 enrolment and verification templates, to be used with the stage 2 matcher. The inlier locations and cardinality are used to generate a similarity metric designated as stage 1 match score. Our choice of interest point finders and local image descriptors was experimentally determined based on accuracy, computation speed, and sensitivity to vascular arcades. For stage 1 pre-processing, we used contrast-limited adaptive histogram equalisation to enhance the contrast of the green channel of the ROI, which is known as the red free optimal imaging arrangement for ocular images [32]. 4.3.1.1 Interest point extraction In this study, we employed a combination of three interest point finder algorithms as shown in Fig. 4. Harris–Stephens (HS) and features from accelerated segment test (FAST) were applied to each ROI to find its vascular corner points, and speeded up robust features (SURF) interest point finder was used to locate additional candidate points in eyeprint vascular neighbourhoods. Figure 3Open in figure viewerPowerPoint Segmentation example showing intermediate steps leading to the ROI mask Figure 4Open in figure viewerPowerPoint Top row: matched FAST (–), HS (··), and SURF (--) points from enrolment (top left) to verification (top right) scans using similarity between their FREAK and SURF descriptors. Bottom row: detected corresponding inliers after RANSAC HS algorithm detects corners using eigenvalues of image partial derivatives [40]. A Gaussian weighted kernel is used to improve the average derivative matrix (6), presented by Marovec, that utilises the partial image derivative in x and y directions (IX and IY) within a given window (5 × 5 in our experiments) around a point. Matrixes with two significant eigenvalues are considered as the edge points (6)FAST designates an image location as a corner point if the 16 neighbouring pixels that lie on Bresenham's circle of radius 1.7 pixels around the candidate pixel have a minimum of N contiguous pixels brighter or darker than the candidate pixel [41]. N was set to 9 in our experiments (FAST 9). SURF, a local image descriptor similar to but faster than scale-invariant feature transformation (SIFT) [42], has its own scale and rotation invariant interest point finder that uses the determinant of the hessian matrix [36]. For a given point (x, y) in an image, using a window around the points, the Hessian matrix is formulated as follows (7)where σ = 1.2 is the spread of Gaussian and ⊗ is the convolution operator. However, to improve the performance over SIFT, integral images and box filters are used instead of regular image and Gaussian filters, respectively. After investigating several scales, a non-minimal suppression algorithm is used to localise the points from various scales. Finally, using the determinant of Hessian matrices, the local maxima are interpolated across various scales to find the interest point. 4.3.1.2 Local image descriptors We used the following collection of local image descriptors for higher matching accuracy and robustness. Fast retina keypoint (FREAK) is a biologically inspired local image descriptor that relies on retinal sampling patterns. It compares 43-weighted Gaussians around an interest point, out of which it retains a subset of 512 high variance features [43]. In this study, FREAK descriptors were calculated around FAST and HS interest points. Using the scale (S) and orientation information from each SURF interest point, features are derived within a square window of side 20*S, centred at the interest point. The descriptor window is divided into 16 square sub-regions of side 5*S, within which the Haar response along x and y (dx and dy, respectively) are calculated. Each sub-region contributes four features, and so a total of 4*16 features for each interest point is found [36] (8)A collection of the aforesaid interest points and their local image descriptors (9) for each ROI is considered as the ROI's stage 1 template (9)where P1 contains the detected FAST and HS' interest points locations, while P2 contains SURF's. During the verification process, we initially generate the sum of squared difference (SSD) and Hamming distance (HD) scores for all pairs of D1 and D2 descriptors, respectively, across enrolment and verification templates. The HD and SSD are formulated as (10) (11)where N is the length of descriptors (512 for FREAK and 64 for SURF). Next, we determine the matched points using nearest neighbour algorithm if the SSD or HD distance is below a certain threshold [44]. 4.3.1.3 Outlier detection There are many advantages to local as opposed to global image matching, including resilience to partial image occlusions and distortions. However, to do so one needs to combine the local matches into a global measure of similarity. Furthermore, in our case, vascular structures may look somewhat similar in small neighbourhoods, and so the matched patches between enrolment and verification templates could constitute false correspondences (Fig. 4). Assuming an approximate homography (understanding that the eye globe is 3D, we assume that the projections of eyeprints on the 2D image sensor's plane to be conforming to the assumed homography. We assume that this 3D to 2D projection distortions to be consistent across captures within reason given the data collection app's user guidance), such as affine, between the true matched points from an enrolment to verification template, one can reveal inliers (pairs conforming to the transform) from outliers (non-overlapping points under the assumed homography). One such method is the RANSAC algorithm, an iterative statistical process that randomly samples a subset of matched point correspondences from enrolment to verification, and eventually estimates the most encompassing registration matrix between the spatial coordinates of these points, leaving out those that do not confirm to the assumed geometric mapping. We used RANSAC on the combination of matched points from the union of SURF and FREAK descriptor matching results to derive an affine registration matrix from inliers between enrolment and verification templates. The first stage match score is then generated as (12)where N is the number of inliers after RANSAC, EnI are the (x, y) coordinates of inliers from enrolment, VrI are the corresponding (x, y) coordinates from unregistered inliers from the verification template, and VrTI are the (x, y) coordinates of inliers from the verification template after registration using the derived affine transformation. 'corr' signifies the correlation coefficient. The minimum operator is used to reduce likelihood of false accepts in the rare cases when the registration could increase the above correlation-based match score for the impostors. In case of multiple enrolments or ROIs (multiple template comparisons for the same claimant), the max rule is used to pick stage 1's highest match score. 4.3.2 Stage 2 matcher 4.3.2.1 Phase of Gabor filtered vasculature Gabor filter functions similar to the receptive fields in the primary cortex. They are known for preserving edges, and their phase information is illumination independent [45–47]. Furthermore, our independent component analysis on 100 ey

Referência(s)
Altmetric
PlumX