Capítulo de livro Revisado por pares

Long Distance Dependency in Language Modeling: An Empirical Study

2005; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-540-30211-7_42

ISSN

1611-3349

Autores

Jianfeng Gao, Hisami Suzuki,

Tópico(s)

Speech and dialogue systems

Resumo

This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese Kana-Kanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such as training corpus size and genre affect the performance of the models.

Referência(s)