Handwritten Text Recognition and Browsing in Archive of Prisoners’ Letters from Smolensk Convict Prison
2024; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-031-54534-4_16
ISSN1611-3349
AutoresNikita Lomov, Dmitry Kropotov, Danila Stepochkin, Anton Laptev,
Tópico(s)Natural Language Processing Techniques
ResumoThe task of creating a prototype navigation system for a small archive of historical documents (letters from prisoners of the Smo- lensk convict prison of the early 20th century) recorded in a single handwriting, is considered. To fit a model for handwritten text recognition, procedures were created for automatic preparation of image collections, including breaking into lines, pen trace segmentation, and deslanting of lines and pages. Experiments have shown that training a modern neural network on about a thousand line samples with the same handwriting allows achieving a decent recognition quality (5.11% CER and 17.55% WER). Further, the automatically recognized text was used for the task of searching by keywords. The text was corrected by dictionaries and prescribed rules, taking into account the peculiarities of Russian pre-reform spelling, recognition errors and the scriptor's own errors. The search engine reached a precision of 97.14% and a recall of 91.35%. Visualization of the results provided highlighting of the found words on the original images. The study conducted demonstrates the possibility of creating a navigation system and its fitting to a specific handwriting with a small number of marked samples and limited human participation.
Referência(s)