Artigo Revisado por pares

Large-Scale Optical Character Recognition of Ancient Greek

2017; University of Toronto Press; Volume: 14; Issue: 3 Linguagem: Inglês

10.3138/mous.14.3-3

ISSN

1913-5416

Autores

Bruce Robertson, Federico Boschetti,

Tópico(s)

Image Retrieval and Classification Techniques

Resumo

This paper documents our campaign to undertake the large-scale optical character recognition of ancient, or polytonic, Greek. Building upon the Gamera OCR engine and developing a suite of post-processing tools, including automatic spellcheck, we processed 1,200 volumes comprising 329,002,271 Greek words. A sample of 10 pages is studied in detail; they demonstrate the degree to which each step of post-processing improved the results, and with which source documents. These pages attain an average character accuracy of about 96%. These results will provide a basis for further improvements, including the training of other open-source OCR engines.

Referência(s)