An automatic method for determining phonetic boundary for continuous speech utterances in an open source multi-language audio/video database

Artigo Revisado por pares

An automatic method for determining phonetic boundary for continuous speech utterances in an open source multi-language audio/video database

2011; Acoustical Society of America; Volume: 130; Issue: 4_Supplement Linguagem: Inglês

10.1121/1.3655076

ISSN

1520-9024

Autores

Montri Karnjanadecha, Stephen A. Zahorian,

Tópico(s)

Speech Recognition and Synthesis

Resumo

Nine hundred video clips (approximately 30 h in each of English, Mandarin, and Russian) have been collected from Internet sources such as youtube.com and rutube.ru. This multi-language audio/video database has been orthographically transcribed by human listeners with time markers at the sentence level. However, the aim is to provide this database to the public with high accuracy time markers at the phonetic level, which will greatly increase the value of the database. This paper describes an approach to achieving high accuracy automatic phonetic labeling based on a Hidden Markov Model speech recognizer. This automatic method was developed due to the great length of time and tediousness of performing this task using only human listeners. One major challenge for the automatic method was that the audio data consists of spontaneous speech with unconstrained topics and the speech was spoken under various acoustic conditions. The approach begins with a well-trained acoustic model for each language. The acoustic model is then adapted to each passage and finally the phonetic labeling of the passage is determined. Comparison of the automatically determined phone time markers with those obtained by human listeners, for a subset of the speech materials, shows the accuracy of the automatic method.

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

An automatic method for determining phonetic boundary for continuous speech utterances in an open source multi-language audio/video database