An automatic method for determining phonetic boundary for continuous speech utterances in an open source multi-language audio/video database
2011; Acoustical Society of America; Volume: 130; Issue: 4_Supplement Linguagem: Inglês
10.1121/1.3655076
ISSN1520-9024
AutoresMontri Karnjanadecha, Stephen A. Zahorian,
Tópico(s)Speech Recognition and Synthesis
ResumoNine hundred video clips (approximately 30 h in each of English, Mandarin, and Russian) have been collected from Internet sources such as youtube.com and rutube.ru. This multi-language audio/video database has been orthographically transcribed by human listeners with time markers at the sentence level. However, the aim is to provide this database to the public with high accuracy time markers at the phonetic level, which will greatly increase the value of the database. This paper describes an approach to achieving high accuracy automatic phonetic labeling based on a Hidden Markov Model speech recognizer. This automatic method was developed due to the great length of time and tediousness of performing this task using only human listeners. One major challenge for the automatic method was that the audio data consists of spontaneous speech with unconstrained topics and the speech was spoken under various acoustic conditions. The approach begins with a well-trained acoustic model for each language. The acoustic model is then adapted to each passage and finally the phonetic labeling of the passage is determined. Comparison of the automatically determined phone time markers with those obtained by human listeners, for a subset of the speech materials, shows the accuracy of the automatic method.
Referência(s)