Artigo Acesso aberto

Document Image Analysis Using Imagemagick and Tesseract-ocr

2016; Volume: 3; Issue: 5 Linguagem: Inglês

10.17148/iarjset.2016.3523

ISSN

2394-1588

Autores

Parra-Calder oacute n Carlos L., Antony P. J, D N Sachin,

Tópico(s)

Handwritten Text Recognition Techniques

Resumo

Document image analysis is the field of converting paper documents into an editable electronic representation by performing optical character recognition (OCR).In recent years, there has been a tremendous amount of progress in the development of open source OCR systems.The tesseract-ocr engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview.Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier.OCRopus is one of the leading open source document analysis systems using tesseract-ocr with a modular and pluggable architecture.Imagemagick is an open source image processing tool.This paper presents an overview of different steps involved in a document image analysis system and illustrates them with examples from Combination of imagemagick and OCRopus.

Referência(s)
Altmetric
PlumX