Improving Arabic writer identification using score‐level fusion of textural descriptors
2019; Institution of Engineering and Technology; Volume: 8; Issue: 3 Linguagem: Inglês
10.1049/iet-bmt.2018.5009
ISSN2047-4946
AutoresYaâcoub Hannad, Imran Siddiqi, Chawki Djeddi, Mohamed Elyoussfi Elkettani,
Tópico(s)Authorship Attribution and Profiling
ResumoIET BiometricsVolume 8, Issue 3 p. 221-229 Research ArticleFree Access Improving Arabic writer identification using score-level fusion of textural descriptors Yaâcoub Hannad, Corresponding Author Yaâcoub Hannad y.hannad@gmail.com orcid.org/0000-0003-2513-1544 Ibn Tofail University, Kenitra, MoroccoSearch for more papers by this authorImran Siddiqi, Imran Siddiqi Bahria University, Islamabad, PakistanSearch for more papers by this authorChawki Djeddi, Chawki Djeddi orcid.org/0000-0002-8436-827X Larbi Tebessi University, Tebessa, AlgeriaSearch for more papers by this authorMohamed El-Youssfi El-Kettani, Mohamed El-Youssfi El-Kettani Ibn Tofail University, Kenitra, MoroccoSearch for more papers by this author Yaâcoub Hannad, Corresponding Author Yaâcoub Hannad y.hannad@gmail.com orcid.org/0000-0003-2513-1544 Ibn Tofail University, Kenitra, MoroccoSearch for more papers by this authorImran Siddiqi, Imran Siddiqi Bahria University, Islamabad, PakistanSearch for more papers by this authorChawki Djeddi, Chawki Djeddi orcid.org/0000-0002-8436-827X Larbi Tebessi University, Tebessa, AlgeriaSearch for more papers by this authorMohamed El-Youssfi El-Kettani, Mohamed El-Youssfi El-Kettani Ibn Tofail University, Kenitra, MoroccoSearch for more papers by this author First published: 23 January 2019 https://doi.org/10.1049/iet-bmt.2018.5009Citations: 14AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract This paper investigates the problem of writer identification from handwriting samples in Arabic. The proposed technique relies on extracting small fragments of writing which are characterised using two textural descriptors, Histogram of Oriented Gradients (HOG) and Gray Level Run Length (GLRL) Matrices. Similarity scores realised using HOG and GLRL features are combined using a number of fusion rules. The system is evaluated on three well-known Arabic handwriting databases, the IFN/ENIT database with 411 writers, the KHATT database with 1000 writers, and QUWI database with 1,017 writers. Fusion using the 'sum' rule reports the highest identification rates reading 96.86, 85.40, and 76.27% on IFN/ENIT, KHATT, and QUWI databases, respectively. The results realised on the KHATT database are comparable to the state of the art while those reported on the IFN/ENIT and QUWI databases are the highest to the best of authors' knowledge. 1 Introduction Analysis of handwriting has remained an interesting research area for many decades attracting forensic document examiners (FDEs), palaeographers, psychologist, and neurologists. In addition to the classical problem of handwriting recognition [1, 2], handwriting has been studied for problems like characterisation of writer [3–6] and writers' demographics [7, 8], signature verification [9–11], classification of writing styles [12–14], and manuscript dating [15]. Likewise, handwriting has also been investigated to predict personal [16, 17] and behavioural [18, 19] attributes. Furthermore, handwriting is also known to be correlated with a number of neurological disorders like Alzheimer [20–22], Autism [23], and Parkinson [24, 25]. Handwriting is a complex fine motor skill [26, 27] that evolves and develops over years. While the learning process starts with copying shapes from a copy book, with time, each individual develops his own writing style as a function of personal preferences in drawing character shapes or joining them. Consequently, handwriting of each individual is unique [28] and can be employed as an effective behavioural biometric modality. Extraction of writer-specific attributes from handwriting allows identifying the writer of a questioned document. Traditionally, writer identification techniques are categorised into text-dependent [28, 29] and text-independent [3, 30] methods. While, text-dependent methods require writing samples with same textual content for comparison, text-independent methods allow identification of writers independent of the semantic content. Likewise, as a function of handwriting acquisition method, identification methods are distinguished into online [31, 32] and offline [3, 4, 30, 33] techniques. Offline methods employ digitised images of handwriting and rely on statistical or structural features extracted locally or globally from the handwriting images. Online techniques, in addition to the character shapes, also exploit online attributes like speed, pressure, and number and order of strokes to characterise a writer. The present study investigates the problem of text-independent writer identification from offline images of Arabic handwriting, a relatively less researched problem once compared to writer characterisation from handwriting in the Roman script. Among well-known techniques proposed in the literature, textural features have been effectively employed to characterise the writer in a number of studies [34–36]. Features based on Gabor filters and gray level co-occurrence matrices (GLCM) have been investigated in [36], while GLRL matrices and GLCM are considered in [35]. Likewise, textural measures including local binary patterns (LBP) and local phase quantisation (LPQ) have been employed in [34]. Combination of different features to enhance the identification rates has also been explored [37–39]. Another interesting set of methods considers small writing fragments or graphemes for writer characterisation [3, 30, 39–42]. These techniques either rely on a direct comparison of graphemes [3] or small fragments [40] or, first generate a codebook similar to bag of features approaches. These codebooks have been considered separately for each writer [42] as well as globally for all writers in the database [30]. The writing fragments in these studies are compared either directly on pixel values (cross correlation) or by representing these fragments using a set of features. Based on the same idea, the authors investigated the effectiveness of textural measures to represent fragments in the writing [4, 43]. Descriptors including LBP, local ternary patterns (LTP), LPQ, and histogram of oriented gradients (HOG) were explored in these studies and realised promising results on writer identification from Arabic handwriting images. The present study extends authors' previous works [4, 43] to investigate the effectiveness of GLRL features in characterising writing style from small fragments of writing. The authors also explore a number of well-known fusion rules to enhance the writer identification performance. Textural descriptors (GLRL & HOG) are computed from small writing fragments and different fusion schemes are employed to identify the writer of a query document. Evaluations on large databases of Arabic handwriting images reported high classification rates. The key highlights of this study are listed in the following. Characterisation of writer using the textural information in small writing fragments. Investigation of decision fusion rules to enhance the overall identification rates. A comprehensive series of experiments on three different benchmark databases of Arabic handwritings. Identification rates (validated through statistical testing) that are comparable to/better than the current state-of-the-art on this problem. The paper is organised as follows. In the next section, the authors discuss the significant recent contributions to writer identification from Arabic handwritten documents. Section 3 presents the details on the extraction of textural descriptors from handwriting images along with the different fusion techniques investigated in authors' study. Databases employed in authors' study, experimental settings, and the realised results are presented in Section 4, while Section 5 concludes the paper. 2 Related work Studies on identification of writers from Arabic documents are relatively recent once compared to those based on the Roman script. Among pioneer works, Gazzah & Amara [44] identified the challenges in Arabic handwriting including shape variations of characters as a function of position in the word, inclination of writing, and overlapping ligatures etc. Few of such problems are illustrated in Fig. 1 [44]. For characterisation of writers, authors proposed a set of local and global structural features. Classification using a Multilayer Perceptron (MLP) realised an identification rate of 94.7% on writing samples from 60 different writers. Figure 1Open in figure viewerPowerPoint Few of the problems in Arabic handwriting (a) Shape variation of the same letter as a function of position in the word, (b) Writer dependent shape variations of two Arabic characters, (c) Inclination of writing, (d) Several characters written in a combined way according to the style of writer, (e) Overlapping of ligatures, (f) Different shapes and positioning of dots In 2007, Bulacu et al. [39] evaluated the features proposed for identification of writers using English and Dutch writings [33] on Arabic handwritings. The features investigated include the probability distribution functions (PDFs) of the contour direction, co-occurrence, and run-lengths. These textural measures were combined with a codebook (of size 400) of Arabic graphemes produced using the k-means clustering. The system realised an identification rate of 88% on writing samples of 350 different writers. The authors conclude that while features proposed for writer identification from samples using Latin alphabet can be adapted for Arabic script as well, the Arabic samples are much more challenging. In another work [40], authors exploit small writing fragments to characterise the writer. The technique is primarily inspired from [45] where authors group morphologically similar graphemes into clusters. The work in [40], however, considers small patterns in writing rather than complete graphemes and reports an identification rate of around 94% on 33 writers. The major issue with this technique is its computational complexity that involves computing the similarity measure between writing fragments of the query document and those of all the documents in the reference base. Similar to the works of Bulacu et al. [39], Abdi et al. [37] employed the PDFs of a number of structural features extracted from Arabic writing samples to characterise the writer. Dots and diacritic marks were removed as a preprocessing step prior to feature extraction. Evaluations were carried out using a number of distance measures (Euclidian, Hamming, Manhattan, ) with Borda voting classification, and identification rate of 90.2% was realised on 82 writers. In another study, Djeddi & Souici-Meslati [35] presented a global approach based on texture analysis of handwritings to identify the writer. Features based on the GLRL Matrices and the GLCM were employed to capture the textural information in a writing. Nearest neighbour classification with Euclidean distance reported an identification rate of 82.62% on a database of 130 writers. Among other significant contributions, Awaida and Mahmoud [38] employed a combination of structural and statistical features extracted from connected components of the binarised images of handwriting. Features compared using Euclidean distance (nearest neighbour classification) realised an identification rate of 75% on a reference base of 250 writers. In 2014, Djeddi et al. [46] evaluated a number of textural features to identify writers from a large Arabic database of 1,000 different writers. Features investigated include run-length matrices, edge-hinge distribution, and edge-direction distribution. Classification was carried out using a multi-class SVM, and the best results were reported by the combination of run-length and edge-hinge features reading 84.10%. In another study [47], authors exploit a grapheme-based codebook to characterise the writer of a given sample. Unlike the traditional approach of segmenting the writing and clustering patterns to generate the codebook, the authors employ beta-elliptic model to synthesise the codebook. A Top-1 identification rate of around 90% is reported on 411 writers of the IFN/ENIT database. Hannad et al. [4] proposed a local approach that is based on small fragments of handwriting. The handwriting image is divided into a large number of small windows, and the fragments in each window are represented by histograms of well-known textural descriptors. These include LBP, LTP, and LPQ. Two samples are compared by computing the distance between the respective histograms of the fragments resulting from the two images. The system evaluated on the 411 writers of the IFN/ENIT database achieved an identification rate of 94.89%. The work was later extended [43] to investigate the effectiveness of HOG as a descriptor of small writing fragments. Al-Maadeed et al. [48] proposed a set of geometrical features to characterise writer. These futures include direction, curvature, and tortuosity with improvement of the traditional edge-based directional and chain code based features. The technique was evaluated on both English and Arabic samples in the QUWI database, and a Top-1 identification rate of 70.08% was realised on Arabic samples of 1,017 writers using Kernel Discriminant Analysis (KDA) as classifier. Among other recent works, Khan et al. [49] employed the bagged discrete cosine transform (BDCT) descriptor to identify the writers. Universal codebooks are first used to generate multiple predictor models. A final decision on a query handwriting is obtained using the majority voting rule from these predictor models. While the proposed system achieved high performance on writings in the Roman script, a relatively low identification rate of 76% was realised on the Arabic writing samples of 411 writers in the IFN/ENIT database. In another recent study on Arabic writer identification, Ahmed et al. [50] proposed a holistic technique based on clustering. The images are segmented into lines, words and subsequently into characters and a number of structural and statistical features are extracted. Authors investigated different combinations of features with multiple clustering techniques and a number of distance measures. The highest identification rates are reported for a combination of intensity and slope features reading near to 63% on 1,000 writers of the KHATT database. In another work [51], authors employ the RootSIFT descriptor that is computed from the contours of handwriting. The writing style of an individual is then characterised using GMM super-vectors. Exemplar-SVMs are used in the classification step, and the experiments are carried out on three different databases. Asi et al. [52] exploit a combination of global and local features to characterise writer from modern as well as historical manuscripts. The high-level information in the writing style is captured by transforming the key-point-based features (SIFT) into a global descriptor, while the local handwriting characteristics are extracted using a modified form of the well-known contour direction feature. Among various experiments conducted, an identification rate of 85.5% is reported on 1,000 writers of the KHATT database. An overview of the methods discussed in the preceding paragraphs is presented in Table 1. IFN/ENIT, KHATT, and QUWI are the three popular databases with 411, 1000, and 1017 writers, respectively. In all cases, the focus of a study lies on enhancing the feature extraction step, while traditional classifiers (KNN, SVM or ANN etc.) have been employed for identification. Here, the authors explore the impact of different fusion techniques to enhance the identification rates. The details of the proposed technique are presented in the following section. Table 1. Performance comparison of different Arabic writer identification systems Study Year Features Writers Ident. rate, % Gazzah & Najoua-Ben [44] 2005 globals and locals features 60 94.7 Bulacu et al. [39] 2007 combination of textural features and codebook 350 88 Djeddi & Souci-Meslati [40] 2008 small writing fragments 33 93.93 Abdi et al. [37] 2009 PDFs of different features 82 90.2 Djeddi & Souci-Meslati [35] 2010 statistics of GLRL & GLCM matrices 130 82.62 Awaida & Sabri [38] 2013 structural and statistical features 250 75 Djeddi et al. [46] 2014 run lengths, edge-hinge & edge-direction features 1000 84.10 Abdi & Khemakhem [47] 2015 codebooks of small fragments 411 90 Hannad et al. [4] 2016 textural measures from small writing fragments 411 94.86 Al-Maadeed et al. [48] 2016 geometrics features 1017 70.08 Khan et al. [49] 2017 BDCT descriptors 411 76 Ahmed et al. [50] 2017 structural and statistical features 1000 62.93 Christlein et al. [51] 2017 GMM supervectors 150 99.50 Asi et al. [52] 2017 key point-based features 1000 85.50 3 Proposed technique The proposed technique relies on extracting features from small fragments of writing and classification using fusion rules. An overview of the proposed system is presented in Fig. 2, while each of the involved steps is discussed in the following sections. Figure 2Open in figure viewerPowerPoint Overview of the Proposed System 3.1 Feature extraction Prior to feature extraction, the handwriting images are binarised using global thresholding andconnected components in the image are extracted. A connected component maycorrespond to a complete word, a partial word, or a single character. Eachconnected component is further divided into small windows to extract writingfragments. The division technique employed is the same as presented in authors' previous work [4] where blocks of size are used to extract the fragments(N being empirically fixed to 100). Compared to othersimilar studies which exploit small writing fragments [40, 42], the windowsize in authors' work is relatively larger to contain significant parts ofArabic partial words (Fig. 3). The key difference lies in how the fragments are eventually representedfor subsequent comparison. While writing fragments in [40, 42] are comparedusing pixel values, the authors represent each fragment using high-leveldescriptors. Consequently, the windows extracted from a connected component arerequired to contain 'sufficient' portions of handwriting so that the extractedfeatures are meaningful. The following textural descriptors are investigated inauthors' study. Figure 3Open in figure viewerPowerPoint Writing fragments extracted from Arabic words 3.1.1 Histogram of oriented gradients - HOG HOG [53] is one of the most widely employed descriptors generally applied to object recognition and classification problems. A number of recent studies have demonstrated the effectiveness of HOG and other similar descriptors for analysis of handwriting [54, 55] as well as signatures [56]. In authors' implementation, the authors compute the HOG descriptor to represent each of the writing fragments. Each fragment is divided into nine cells and a nine-bin histogram of gradients is computed from each cell resulting in a feature vector of dimension 81. These parameters are the same as those employed in the authors' previous work on studying HOG to characterise the writing fragments [43]. 3.1.2 Gray level run length matrices A run in a given linear orientation is a sequence of pixels having the same intensity values. Lengths of runs of different gray values characterises how fine or coarse the texture is [57]. A common technique to capture the textural information using run lengths is to compute the run length matrices. For a given orientation, the entry of run length matrix P is the number of runs in the image with intensity value i and length j. The size of the matrix P is , where N is the total number of unique gray levels in the image and K is the maximum length of run in a given direction [35]. The computation of (horizontal) run length matrix for a binary image is illustrated in Fig. 4. The values '13' and '3' in the first row indicate that for pixel values 0, there are 13 runs of length 1 and 3 runs of length 4 in the horizontal direction. Likewise, the second row of the matrix stores the information on runs of different lengths for pixel values 1. Figure 4Open in figure viewerPowerPoint Computation of run length matrix (a) A binary image of size , (b) Run length matrix in the horizontal direction In general, run length matrices are computed for the principal orientations , , , and . Effectiveness of run length matrices in characterising the writer has been studied in [58, 59]. However, unlike the computation of these matrices globally from the complete page of handwriting [58, 59], the authors compute the four matrices for each of the writing fragments locally. Each fragment in authors' study is and is represented as a binary image resulting in a matrix of size . The two rows of the matrix represent runs of black and white pixels abbreviated as and , respectively. Each matrix is converted to a vector (of dimension 200), and the vectors of the four matrices are concatenated to represent each writing fragment with a descriptor of 800 values. Once the HOG - and GLRL -based features are computed from fragments in a given writing sample, the complete page is represented by the set of features computed from all the fragments. The term refers to the total number of fragments in the sample D and represents the resulted HOG or GLRL histogram computed from the fragment i of the document D. The features are computed for all the documents in the database under study hence producing a reference base R. 3.2 Writer identification (classification) The aim of classification step is to find the authorship of a queried handwritten document. Once a query writing sample Q is presented to the system, the same steps of division of writing and extraction of features are carried out. The document is then matched with all those in the reference base using the following dis-similarity measure. where and represent, respectively, the feature vectors (HOG or GLRL) of writing fragments in samples Q and D. In other words, each fragment in the query sample Q is compared with all the fragments in the reference document D to find the one that reports the minimum distance. The distances of the matched fragments are then added to compute the distance between the two documents. Two vectors are compared using Hamming distance defined in the following. where dim is the dimension of the feature vector. The writer of the document in the reference base that reports minimum distance to Q is identified as the writer of Q. 3.3 Fusion techniques Fusion techniques are known to enhance the performance and have been applied to a number of classification problems in general [60] and biometrics in particular [61–63]. Fusion can be carried out at feature level [64, 65] where multiple features capturing different type of information and combined together or, at decision level [66, 67] where scores of different classifiers are combined to enhance the overall system performance. For writer identification, a number of studies have validated the effectiveness of fusion techniques in improving the identification rates both at feature [68] and score [30, 33] levels. Here, the authors investigate score-level fusion to enhance the identification rates. The fusion procedure is based on the generation of a scalar score which represents the resulting dissimilarity computed from dis-similarities of different textural descriptors using the 'sum', 'product', 'min', and 'max' fusion rules. For completeness, the authors briefly outline these combination rules in the following. Sum: The sum rule involves summing (or averaging) the confidence scores of base classifiers to obtain the final score for a given class. Product: The product rule involves multiplying the individual scores to arrive at the final score. Min: In the 'min' rule, the authors keep the minimum of the participating scores as the final score. Max: The 'max' rule involves keeping the maximum of the individual scores. Fig. 5 illustrates the score-level fusion procedure for dis-similarities of two different textural descriptors. Figure 5Open in figure viewerPowerPoint Score-level fusion procedure for dis-similarities of two different textural descriptors 4 Experiments and results This section presents an overview of the databases employed in the authors' evaluations, the experimental settings, and the realised results. The authors first introduce the datasets followed by a discussion on results with and without using the fusion techniques. The results are also compared with well-known Arabic writer identification systems reported in the literature and a statistical analysis of the realised results is also carried out. 4.1 Databases Three well-known and publicly available Arabic handwriting databases are considered in the authors' study. These include IFN/ENIT, KHATT, and QUWI as detailed in the following. 4.1.1 IFN/ENIT database IFN/ENIT [69] is one of the premier and most widely used databases of Arabic handwriting. The database with 2,200 forms has been used to evaluate handwriting recognition and writer identification systems. The database comprises of a total of >26,000 Arabic words (names of Tunisian towns and villages) contributed by 411 different writers. Sample word images from the database are illustrated in Fig. 6. Figure 6Open in figure viewerPowerPoint Sample word images from the IFN/ENIT database In authors' experiments, 30 words of each of the 411 writers are used in the training set, while 20 words in the test set. 4.1.2 KHATT database The KHATT database [70] is a large database of handwritten samples in Arabic with 2000 fixed and 2000 free text paragraphs. A total of 1,000 different writers contributed to data collection with four samples per writer, and the images are scanned at multiple resolutions (200, 300 and 600 dpi). Sample paragraphs from the database are shown in Fig. 7. Figure 7Open in figure viewerPowerPoint Sample paragraph images from the KHATT database In our experimental setup, the complete set of 1000 writers is considered with two free paragraphs per writer in the training set and one fixed paragraph image in the test set. 4.1.3 QUWI database The QUWI database [71] is a large database containingArabic and English samples written by 1017 writers. All images are scannedat a resolution of 600 dpi, and each writer produced four samples, one freeand one fixed text in Arabic and, one free and one fixed text in English.The gender and handedness information of writers is also available in thedatabase making it useful for evaluation of writer demographicclassification systems as well. Sample (Arabic) pages from the database areillustrated in Fig. 8. For evaluation of the authors' system, the authors consider thecomplete set of 1017 writers with one free paragraph of each writer in thetraining set and the other fixed text paragraph in the test set. Figure 8Open in figure viewerPowerPoint Sample text images from the QUWI database A summary of the three databases and the distribution into training and test sets of each is presented in Table 2. Table 2. Summary of databases used in evaluations Database Writers Training set Test set IFN/ENIT [69] 411 30 words/writer 20 words/writer KHATT [70] 1000 2 paragraphs/writer 1 paragraph/writer QUWI [71] 1017 1 paragraph/writer 1 paragraph/writer 4.2 Results on individual descriptors The authors first present the identification performance of the individual HOG and GLRL descriptors on the three databases. The GLRL descriptor is also separately evaluated for runs of white and black pixels in the four orientations as discussed in Section 3.1.2. The results of these evaluations are summarised in Tables 3–5 for IFN/ENIT, KHATT, and QUWI databases, respectively. The results are reported in terms of Top-1, Top-5, and Top-10 identification rates where the term Top-k means that the writer of a query sample was correctly found in the first k most similar samples returned by the system. It can be seen that on the three databases, the combination of black and white run length features reported the highest identification rates reading 94.16, 76.70, and 68.14% (Top-1) on IFN/ENIT, KHATT, and QUWI databases, respectively. Table 3. Identification rates (in %) on 411 writers of the IFN/ENIT database Descriptor Top 1 Top 5 Top 10 86.62 96.35 97.82 76.16 91.24 95.62 86.37 97.08 98.31 & 94.16 97.81 98.54 Table 4. Identification rates (in %) on 1000 writers of the KHATT database Descriptor Top 1 Top 5 Top 10 72.20 87.30 90.90 55.30 73.30 78.10 57.10 73.80 80.50 & 76.70 86.10 88.70 Table 5. Identification rates (in %) on 1017 writers of the QUWI database Descriptor Top 1 Top 5 Top 10 HOG 65.92 80.38 86.56 47.12 67.58 71.44 51.36 68.78 74.28 & 68.14 77.32 82.52 4.3 Results using fusion techniques To enhance the identification rates, the authors combine the scores of individual descriptors using the well-known fusion rules including 'sum', 'product', 'min', and 'max' as discussed previously. Tables 6–8 present the identification rates using different combinations of descriptors for the three datasets. It is interesting to not
Referência(s)