Capítulo de livro Acesso aberto Revisado por pares

Layout Analysis and Content Classification in Digitized Books

2017; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-319-56300-8_14

ISSN

1865-0937

Autores

Andrea Corbelli, Lorenzo Baraldi, Fabrizio Balducci, Costantino Grana, Rita Cucchiara,

Tópico(s)

Image Processing and 3D Reconstruction

Resumo

Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text as well as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.

Referência(s)