Artigo Acesso aberto Revisado por pares

Machine transliteration and transliterated text retrieval: a survey

2018; Springer Science+Business Media; Volume: 43; Issue: 6 Linguagem: Inglês

10.1007/s12046-018-0828-8

ISSN

0973-7677

Autores

Dinesh Kumar Prabhakar, Sukomal Pal,

Tópico(s)

Topic Modeling

Resumo

Users of the WWW across the globe are increasing rapidly. According to Internet live stats there are more than 3 billion Internet users worldwide today and the number of non-English native speakers is quite high there. A large proportion of these non-English speakers access the Internet in their native languages but use the Roman script to express themselves through various communication channels like messages and posts. With the advent of Web 2.0, user-generated content is increasing on the Web at a very rapid rate. A substantial proportion of this content is transliterated data. To leverage this huge information repository, there is a matching effort to process transliterated text. In this article, we survey the recent body of work in the field of transliteration. We start with a definition and discussion of the different types of transliteration followed by various deterministic and non-deterministic approaches used to tackle transliteration-related issues in machine translation and information retrieval. Finally, we study the performance of those techniques and present a comparative analysis of them.

Referência(s)
Altmetric
PlumX