Artigo Revisado por pares

QUIS‐CAMPI: an annotated multi‐biometrics data feed from surveillance scenarios

2017; Institution of Engineering and Technology; Volume: 7; Issue: 4 Linguagem: Inglês

10.1049/iet-bmt.2016.0178

ISSN

2047-4946

Autores

João C. Neves, Juan Carlos Moreno, Hugo Proença,

Tópico(s)

Automated Road and Building Extraction

Resumo

IET BiometricsVolume 7, Issue 4 p. 371-379 Research ArticleFree Access QUIS-CAMPI: an annotated multi-biometrics data feed from surveillance scenarios João Neves, Corresponding Author João Neves jcneves@ubi.pt IT – Instituto de Telecomunicações, Department of Computer Science, University of Beira Interior, Rua Marquês d'Ávila e Bolama, Covilhã, PortugalSearch for more papers by this authorJuan Moreno, Juan Moreno Department of Computer Science, University of Beira Interior, Rua Marquês d'Ávila e Bolama, Covilhã, PortugalSearch for more papers by this authorHugo Proença, Hugo Proença IT – Instituto de Telecomunicações, Department of Computer Science, University of Beira Interior, Rua Marquês d'Ávila e Bolama, Covilhã, PortugalSearch for more papers by this author João Neves, Corresponding Author João Neves jcneves@ubi.pt IT – Instituto de Telecomunicações, Department of Computer Science, University of Beira Interior, Rua Marquês d'Ávila e Bolama, Covilhã, PortugalSearch for more papers by this authorJuan Moreno, Juan Moreno Department of Computer Science, University of Beira Interior, Rua Marquês d'Ávila e Bolama, Covilhã, PortugalSearch for more papers by this authorHugo Proença, Hugo Proença IT – Instituto de Telecomunicações, Department of Computer Science, University of Beira Interior, Rua Marquês d'Ávila e Bolama, Covilhã, PortugalSearch for more papers by this author First published: 22 December 2017 https://doi.org/10.1049/iet-bmt.2016.0178Citations: 6AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The accuracy of biometric recognition in unconstrained scenarios has been a major concern for a large number of researchers. Despite such efforts, no system can recognise in a fully automated manner human beings in totally wild conditions such as in surveillance environments. Several sets of degraded data have been made available to the research community, where the reported performance by state-of-the-art algorithms is already saturated, suggesting that these sets do not reflect faithfully the conditions in such hard settings. To this end, the authors introduce the QUIS-CAMPI data feed, comprising samples automatically acquired by an outdoor visual surveillance system, with subjects on-the-move and at-a-distance (up to 50 m). Also, they supply a high-quality set of enrolment data. When compared to similar data sources, the major novelties of QUIS-CAMPI are: (i) biometric samples are acquired in a fully automatic way; (ii) it is an open dataset, i.e. the number of probe images and enroled subjects grow on a daily basis; and (iii) it contains multi-biometric traits. The ensemble properties of QUIS-CAMPI ensure that the data span a representative set of covariate factors of real-world scenarios, making it a valuable tool for developing and benchmarking biometric recognition algorithms capable of working in unconstrained scenarios. 1 Introduction Over the past years, biometric research has been evolving toward the development of systems that work in adverse conditions. This trend has been mainly supported by the increasing number of commercial systems relying on biometric recognition, and, at the same time, in the interest in extending these systems to unconstrained scenarios. These efforts have proven fruitful, as in the case of the Verilook Surveillance system [1], where persons are recognised on-the-move (OM) using face features. However, this system is confined to indoor scenarios and demands a large amount of enrolment data. In fact, the recognition of humans in totally wild conditions such as in surveillance environments is still to be accomplished [2], making biometric recognition in the wild a highly popular topic, and, at the same time, one of the most ambitious goals for the research community. Biometric datasets are an important asset to push forward the state-of-the-art recognition performance. As an example, we highlight the evolution of face recognition datasets, which have moved toward more challenging conditions, as novel algorithms surpass the challenges of the hardest sets. This fact is particularly evident in the case of labelled faces in the wild (LFW) [3], which was originally proposed to meet the satisfactory performance of state-of-the-art algorithms in addressing the typical covariates of facial recognition. The high variability of the data has caught researchers attention, who have progressively advanced the robustness of face recognition algorithms {87% accuracy in 2009 [4] to 95% accuracy in 2015 [5]}. LFW has paved the way for biometric recognition in the wild, and fostered the development of even more challenging datasets. Nonetheless, as noted by Klare et al. [6], one explanation for unconstrained face recognition being still far from solved is that the LFW and similar datasets are not fully unconstrained. To close this gap, Klare et al. [6] introduced the IJB-A dataset, which follows the spirit of LFW, but includes high variability in pose. However, even this challenging dataset does not encompass the complete set of covariate factors present in real surveillance scenarios, as the majority of the images were not acquired OM, and in an automated manner reducing the levels of blur caused either by motion or incorrect focusing (Fig. 1). Figure 1Open in figure viewerPowerPoint Illustration of the six covariate factors in the QUIS-CAMPI dataset. The use of an automated master–slave system for acquiring facial imagery in a fully NC and covert manner assures an effective representativeness of the covariates of biometric recognition in the wild. Pose, occlusions, facial expression, motion-blur, illumination, defocus In this paper, we provide a tool to bridge the gap between surveillance and biometric recognitions, by announcing the QUIS-CAMPI data feed, whose acronym derives from Latin and summarises its goals: 'Quis' stands for 'Who is' and 'Campi' refers to a delimited space. Hence, this set aims at fostering the development of biometric recognition systems that work outdoors (OUs), in fully unconstrained and covert conditions. To this end, we designed an automated master–slave surveillance system to capture both full-body video sequences and high-resolution head samples of subjects in a parking lot. The particularities of the surveillance system permit the continuous acquisition of novel biometric samples that are supplied to the dataset after being manually screened and associated to the corresponding gallery subjects. This singularity is the rationale for considering that QUIS-CAMPI is the first open biometric dataset, since the number of biometric samples grows on a daily basis. For the same reason, we argue that the proposed set can be considered a data feed, where researchers can obtain novel biometric data captured in a realistic surveillance scenario. In spite of the multiple advantages of the QUIS-CAMPI dataset, it also raises important questions on evaluation and privacy. As such, the impact of the continuous supply of new probes has been carefully planned by introducing dataset versioning (see Section 4.3). Regarding privacy, while the data collection has been authorised by the Portuguese data protection authority, we believe that the surveillance system could be used in a public space without compromising privacy by adopting a watchlist-based recognition, i.e. the goal is not to identify the person, but determine if the probe data corresponds to a subject in the watchlist. It should be also mentioned that a subset of the QUIS-CAMPI dataset has already been published before [7] to promote the International Challenge on Biometric Recognition in the Wild (ICB-RW) competition. When compared to ICB-RW, QUIS-CAMPI has the following advantages: (i) the ICB-RW challenge provided only 10 face images from each of the 90 subjects, whereas QUIS-CAMPI contains more than 3000 images from 320 subjects; (ii) ICB-RW was based on a single evaluation metric, whereas QUIS-CAMPI provides a comprehensive evaluation protocol both for the verification and identification modes; (iii) QUIS-CAMPI provides a comprehensive evaluation along the proposed evaluation protocols; and (iv) QUIS-CAMPI provides a continuous feed of biometric samples, as well as a version control strategy for obtaining new data. Contributions: When compared to the existing biometric datasets, the QUIS-CAMPI data feed has four major novelties: (i) biometric traits are automatically acquired by a master–slave surveillance system in a fully non-cooperative (NC) and covert manner. This allows the data to be acquired at-a-distance (AD) (up to 50 m) and OM, and assures an effective representativeness of the covariates of biometric recognition in the wild; (ii) it is an open dataset, i.e. new samples are continuously and automatically being added to the dataset and supplied to the research community. This singularity inhibits biased performance estimation – usually caused by parameter adjustment in the test set – without the burden of sequestered test data; (iii) it is surveillance representative, i.e. the probe images truly encompass all the singularities of surveillance environments, and thus advances in the recognition accuracy of these data have a direct impact on the deployment of a fully automated biometric recognition surveillance system. The remainder of this paper is organised as follows: Section 2 overviews the datasets for assessing the recognition performance in these environments. Section 3 describes the master–slave surveillance system developed for data acquisition. A detailed description of the announced dataset is given in Section 4. Section 5 describes the evaluation protocol and compares the results attained by state-of-the-art face recognition algorithms in the QUIS-CAMPI and LFW datasets. Finally, Section 7 concludes this paper. 2 Biometric datasets About 25 years ago, biometric recognition emerged as an interesting topic, leading to the development of many novel algorithms, usually validated in small, non-representative and proprietary databases, according to distinct evaluation protocols. To meet the growing demands for objective evaluation tools, sets of biometric samples comprising different covariate factors were introduced as a solution. The ORL database of faces [8], the AR face database [9] and the Yale face database were pioneer sets on face recognition, while FERET [10] was the first benchmark on this topic. Despite their valuable contribution in providing objective and trustworthy tools for assessing recognition performance, these sets soon became outdated as novel algorithms reported almost ideal accuracy on these data. These improvements fostered the development of more challenging datasets such as the CMU PIE [11], the multi-PIE [12], the XM2VTS [13] and the BANCA [14] databases, comprising biometric samples with significant variations in illumination, pose, and expression. Also, different challenges were introduced for assessing the accuracy of state-of-the-art face recognition methods in less constrained scenarios [e.g. the Face Recognition Grand Challenge (FRGC) [15] and the Face Recognition Vendor Test (FRVT 2006)]. Aiming at providing more realistic data, the research has advanced toward the acquisition of unconstrained samples along the diverse biometric traits such as iris [16], periocular [17, 18], and face [19]. Regarding face recognition, LFW was the first database particularly devised for studying face verification in the wild and was, therefore, responsible for promoting the development of more robust algorithms (an increment of 10% in the recognition accuracy in past years), as well as for fostering the emergence of more challenging collections of data (e.g. PubFig [20], FaceScrub [21], IJB-A [6], and Disguise and Makeup Faces database [22]). Simultaneously, still to video modality has also gained increasing attention leading to the development of novel datasets and biometric challenges on this topic. The video challenge portion of the multiple biometrics grand challenge contained subjects walking toward the camera and non-frontal footage of subjects performing an activity. Later, the point and shoot face recognition challenge was introduced, comprising unconstrained video sequences of subjects performing multiple activities OUs. The YouTube Faces database [23] contains unconstrained recordings obtained from the Internet, and it was particularly designed for studying the problem of unconstrained face recognition in videos. On contrast, the SC-FACE [24] and the ChokePoint [25] datasets were originally intended to provide data acquired in realistic indoor surveillance scenarios. Regarding surveillance scenarios, the PETS [26], i-LIDS [27], CAVIAR [28] datasets and the VISOR [29] repository comprise video sequences of pedestrians in realistic surveillance scenarios. Even though the low resolution of data inhibits its use for face recognition purposes, it has been showed that the fusion with gait information can significantly increase the performance [30, 31]. The evolution of gait databases over time has varied over three major factors: the number of subjects and sequences, data covariates (e.g. clothing, carried items, and speed), and the acquisition scenario (e.g. indoor and OU). The SOTON large database [32] was the first set to comprise over 100 subjects and has contributed to study the impact of inter-subject variation on gait recognition. The USF dataset [32] was introduced to provide a representative dataset for benchmarking gait recognition in challenging conditions, and investigating which covariates significantly degrade the gait recognition performance. CASIA B [33] is a commonly used gait database containing large view variations, as well as variations in clothing and carrying status. For this reason, this set is usually exploited for the evaluation of cross-view gait recognition and for evaluating the impact of clothing and carrying status on the performance of gait recognition methods. While the remaining databases did not exceed 200 subjects, OU-ISIR LP [34] has overcome the remaining sets by providing gait sequences from more than 4000 subjects with a wide age range. In spite of not containing any covariates, it is useful for estimating the performance of the gait recognition with high statistical reliability. A comparative analysis between state-of-the-art databases concerning unconstrained face recognition is given in Table 1. It is interesting to note that despite these sets comprise highly challenging biometric data, most of them were manually captured by human operators and they still lack several crucial covariate factors of surveillance environments such as motion-blur. Table 1. Comparative analysis between the datasets particularly devised for studying unconstrained biometric recognition Number of subjects Covariate factors NC OM AD OU AA Observations XM2VTS [13] 295 E, I, P ✓ ; a multi-modal database comprising face images, video sequences, and speech recordings acquired at one month intervals BANCA [14] 26 E, I, P ✓ a database of face videos comprising 12 recordings per subject were acquired under controlled, uncontrolled, and adverse conditions FRGC [15] 688 E, I ✓ four controlled still images, two uncontrolled still images, and a 3D face model FRVT 2006 [35] > 35,000 E, I ✓ the first independent performance benchmark for 3D face recognition technology. Also, it comprises still frontal face images acquired under controlled and uncontrolled illuminations SC-FACE [24] 130 E, I, P ✓ ✓ facial imagery acquired in an indoor surveillance scenario using five video surveillance cameras of various qualities LDHF-DB [36] 100 I, F ✓ ✓ this set comprises both visible and near-infrared face images at distances of 60, 100, and 150 m acquired OUs LFW [3] 5749 E, O, I, P ✓ ✓ the first database of face photographs designed for the study of unconstrained face recognition IJB-A [6] 500 E, O, I, P, M ✓ ✓ similar in spirit to the LFW dataset, but containing high variability in pose YouTube Faces [23] 1595 E, O, I, P, M ✓ ✓ a database of face videos designed for the study of unconstrained face recognition in videos Choke Point [25] 25 E, O, I, P, M ✓ ✓ ✓ a database of face videos acquired indoors in an NC manner QUIS-CAMPI 268 (v1) 320 (v2) E, O, I, P, M, F ✓ ✓ ✓ ✓ ✓ the first data feed of biometric samples automatically acquired by an OU surveillance system, with subjects OM and AD Datasets are compared with respect to the number of subjects available and the key covariate factors of recognition in the wild: expression (E), occlusion (O), illumination (I), pose (P), motion-blur (M), and out-of-focus (F). Also, the key aspects to ensure that the data realistic result from real-world scenarios are also included. The abbreviations of these aspects refer to non-cooperative (NC), on-the-move (OM), at-a-distance (AD), outdoor (OU), and automated image acquisition (AA) 3 QUIS-CAMPI data acquisition system For the acquisition of QUIS-CAMPI, we rely on a master–slave surveillance system capable of acquiring face imagery of subjects AD and OM (Fig. 2). Even though this type of configuration is not novel and it has also been used in surveillance scenarios [37], our approach has several singularities, making the system particularly suitable for working in real-world conditions. Figure 2Open in figure viewerPowerPoint Processing chain of the QUIS-CAMPI surveillance system. A master–slave architecture is adopted for the proposed surveillance system, where the master camera is responsible for monitoring a surveillance area and providing a set of interest regions (in this case, the location of subjects face) to the PTZ camera When compared to the existing master–slave systems, which were particularly devised for the acquisition of biometric data AD [38–40], our approach has two major advantages: (i) a novel calibration algorithm avoids the use of extra optical devices [38] or stringent configurations for the cameras [39, 40] and (ii) camera scheduling is performed using a general graph model that determines in real time the best tour to acquire the targets in the scene and can be easily customised to incorporate several prioritisation rules, avoiding the use of manually defined rules [39]. The proposed surveillance system is divided into five major modules, broadly grouped into three main phases: (i) human motion analysis; (ii) inter-camera calibration; and (iii) camera scheduling. The workflow chart of the surveillance system used for acquiring the QUIS-CAMPI dataset is given in Fig. 2 and described in detail in the next sections. As illustrated in Fig. 2, the rationale behind the surveillance system is to use the pan-tilt zoom (PTZ) camera as a foveal sensor, i.e. the video stream obtained from the wide camera is analysed to obtain the location of subjects' head, so that the PTZ camera can image the facial region at a high-magnification state. In the former phase, the master camera is responsible for detecting and tracking multiple subjects in the surveillance area. In every frame, a background subtraction algorithm prunes the search area inspected by a pre-trained human shape detector, whose output instantiates a tracking algorithm. Multi-person tracking is achieved by using multiple instances of the algorithm running simultaneously. Subsequently, the tracking record of each subject is analysed for inferring their position some seconds ahead. This step is particularly important to counterbalance the time offset introduced by the mechanical delay of PTZ devices. In the calibration module, the image coordinates in the master camera referential need to be converted to the correspondent pan–tilt angle. To this end, we relied on a novel calibration algorithm [41] that exploits geometric cues (the vanishing points available in the scene) to automatically estimate subjects' height and thus determine their three-dimensional (3D) position (see [41] for additional details). Finally, the calibration module allows the PTZ camera to determine the sequence of observations which minimises the cumulative transition time, in order to start the acquisition process as soon as possible and maximise the number of samples taken from the subjects in the scene. Considering that, this problem has not a known solution that runs in polynomial time, we relied on a method capable of inferring an approximate solution in real time [42]. 4 Description of the QUIS-CAMPI dataset When planning the QUIS-CAMPI dataset, we had two main concerns: (i) to acquire biometric data of subjects in a real surveillance scenario, covertly, OM, and AD and (ii) to provide multiple biometric enrolment data to perceive the advantages of using media collection [43] for identifying humans in the wild. For that purpose, we collected multiple biometric samples that were organised into two distinct groups: (i) enrolment data and (ii) probe data. In the former, we have enroled volunteers who have provided written authorisation for image acquisition (AA) and distribution. The enrolment was conducted in an indoor controlled scenario and the following samples were collected: soft biometrics, full-body imagery, and a 3D face model. In the latter, the data were acquired by the system described in Section 3, while subjects walked throughout a surveillance area. Probe samples comprise high-resolution face images automatically captured by the PTZ camera. It is important to note that the large majority of subjects use this area in their normal routine, which ensures a faithful representation of surveillance covariates. Fig. 3 illustrates the biometric data available for each subject. Figure 3Open in figure viewerPowerPoint Illustrative example of the biometric data available in QUIS-CAMPI. For each subject of the database, distinct biometric traits are acquired during enrolment, comprising (a) Soft biometrics, full-body imagery, (b) 3D model, (c) Subsequently, high-resolution face images are automatically collected each time a subject enters the surveillance area in an NC way. Note that these data are acquired under varying lighting and weather conditions, at different times of the day, while subjects are OM and AD 4.1 Enrolment data Enrolment data provide good quality samples acquired indoor: soft biometrics, full-body imagery, and a 3D face model. Soft biometrics: About 11 types of soft biometric labels were registered for each subject. The full list is presented in Table 2 and the rationale behind the choice of these features was their discernibility AD and the discrimination power reported in the study of Tome et al. [44]. The distribution of each trait with respect to the labels adopted is depicted in Fig. 4. Table 2. List of the soft biometric traits collected during enrolment Trait Labels age height weight sex male, female ethnicity Caucasian, African, Hispanic, Asian, Indian skin colour white, tanned, oriental, black hair colour none, black, brown, red, blond, grey, dyed hair length none, shaven, short, medium, long facial hair colour none, black, brown, red, blond, grey facial hair length none, stubble, moustache, goatee, full beard hair style none, straight, curly, wavy, frizzy Figure 4Open in figure viewerPowerPoint QUIS-CAMPI statistics. A set of statistics was collected for distinct types of biometric samples: (i) distribution of the soft traits along the enrolment data (denoted as gallery) and probe data; (ii) distribution of the inter-pupillary distance in the probe images; (iii) distribution of the tracking sequences width collected OUs; and (iv) distribution of the number of days elapsed between the AA of probe data and the enrolment process Full-body shots: A high-resolution image of the person body was acquired at three different angles (frontal, left side, and right side). Also, the intrinsic and extrinsic parameters of the camera were registered, along with five keypoints of the body in the frontal view. These data can be used to infer real-world measurements of body components (e.g. height and face metrology). 3D face model: A set of images acquired at different viewing angles was used to construct a textured 3D model of face using Visual-SFM [45]. 4.2 Probe data Fully unconstrained biometric samples are the key novelty of the QUIS-CAMPI dataset and comprehend face images automatically acquired by the PTZ camera. High-resolution facial shots: The master–slave surveillance system described in Section 3 is used to automatically acquire high-resolution face images of the enroled subjects, while they walk throughout the surveillance area. Considering that not all the acquired data contain the facial region (e.g. incorrect human detection or tracking) and that the surveillance rig is not able to distinguish between enroled and non-enroled subjects, the data are manually screened before being supplied to the database. Also, the face location of the interest subject in the image is provided as metadata. These annotations are determined by a state-of-the-art face detection algorithm [46] and cross-verified manually. On average, the inter-pupillary distance of a face image is 116 px with a standard deviation of 35 px, and about 99% of the images have an inter-pupillary distance higher than 60 px (the minimum resolution required for commercial face recognition engines). 4.3 Database versioning The automated AA of biometric samples and their regular deployment to the dataset is the reason for denoting QUIS-CAMPI as a data feed and at the same it is one of the key novelties of this tool. Moreover, this singularity is the rationale to argue that QUIS-CAMPI is the first open dataset, which is particularly advantageous to avoid inappropriately fitting classifiers to the final test data. Despite the advantages of this choice, it also introduces significant challenges that have to be carefully addressed to ensure that the performance reported in this dataset can be compared in a practical and fair manner. To this end, we relied on git – one of the most commonly used version control systems – to organise the QUIS-CAMPI data feed in two distinct types of branches: (i) the master branch comprises the most updated version of the entire biometric data; (ii) the evaluation branches encompass a former snapshot of the master branch plus the evaluation files defined according to the evaluation protocol of QUIS-CAMPI (see Section 5.1). This structure is depicted in Fig. 5, where the advantages of this strategy can be easily perceived. First, the version control capabilities allow users to navigate through any state of the QUIS-CAMPI data feed using the master branch, which is useful to obtain new biometric samples without the burden of redownloading the entire set. Second, the evaluation branches are static and independent of any updates on master branch, allowing researches to compare their approaches by referring to a specific evaluation branch. Figure 5Open in figure viewerPowerPoint History graph of the QUIS-CAMPI data feed using a version control software. A git repository was used to deploy new samples acquired by the data feed (represented by the master branch), while maintaining static evaluation sets released at much lower rate (represented by branches). This strategy permits researchers to access any state of the QUIS-CAMPI data feed for development purposes, while it also ensures that algorithms can be compared by reporting performance on the different evaluation set versions 4.4 Database availability Regarding the dataset structure, the file names correspond to the AA date in the following format: Y M D h m s , where a, b, c, d, e, and f denote the AA year, month, day, hour, minute, and second, respectively. The correspondences between the files and enroled subjects, as well as the soft biometric traits, are provided in a relational database, which is deployed as a backup SQL file. For convenience, we include a view in the database that eases the access to biometric data using simple SQL queries. For additional information on how to get and use the dataset, please refer to the QUIS-CAMPI web site [http://quiscampi.di.ubi.pt]. 5 Experimental evaluation In this section, we introduce the evaluation protocols that should be adopted for reporting the algorithms performance in the QUIS-CAMPI set. We believe that the proposed guidelines for the different recognition modalities are adequate for the majority of biometric recognition algorithms. However, as in the recent case of the updated guidelines of LFW [19], additional protocols may be included in the future to meet novel requirements. 5.1 Evaluation protocol Having in mind the main purpose of QUIS-CAMPI, i.e. to provide an objective tool for assessing the performance of biometric recognition algorithms in surveillance scenarios, we introduce two evaluation protocols for the two recognition modalities: (i) verification and (ii) identification. 5.1.1 Verification Regarding the verification paradigm, we adopt the protocol defined in LFW [3, 19], which is an objective, simple, and well established way of assessing face verification algorithms. Accordingly, the PTZ face images are used to form pairs of matched images (positive pairs) and mismatched images (negative pairs)

Referência(s)
Altmetric
PlumX