Layout Analysis and Content Classification in Digitized Books
2017; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-319-56300-8_14
ISSN1865-0937
AutoresAndrea Corbelli, Lorenzo Baraldi, Fabrizio Balducci, Costantino Grana, Rita Cucchiara,
Tópico(s)Image Processing and 3D Reconstruction
ResumoAutomatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text as well as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.
Referência(s)