Capítulo de livro Revisado por pares

High Speed Unknown Word Prediction Using Support Vector Machine for Chinese Text-to-Speech Systems

2005; Springer Science+Business Media; Linguagem: Inglês

10.1007/978-3-540-30211-7_54

ISSN

1611-3349

Autores

Juhong Ha, Yu Zheng, Byeongchang Kim, Gary Geunbae Lee, Yoonsuk Seong,

Tópico(s)

Natural Language Processing Techniques

Resumo

One of the most significant problems in POS (Part-of-Speech) tagging of Chinese texts is an identification of words in a sentence, since there is no blank to delimit the words. Because it is impossible to pre-register all the words in a dictionary, the problem of unknown words inevitably occurs during this process. Therefore, the unknown word problem has remarkable effects on the accuracy of the sound in Chinese TTS (Text-to-Speech) system. In this paper, we present a SVM (support vector machine) based method that predicts the unknown words for the result of word segmentation and tagging. For high speed processing to be used in a TTS, we pre-detect the candidate boundary of the unknown words before starting actual prediction. Therefore we perform a two-phase unknown word prediction in the steps of detection and prediction. Results of the experiments are very promising by showing high precision and high recall with also high speed.

Referência(s)