Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition

Artigo Acesso aberto Revisado por pares

Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition

2010; Electronics and Telecommunications Research Institute; Volume: 32; Issue: 5 Linguagem: Inglês

10.4218/etrij.10.1510.0024

ISSN

2233-7326

Autores

Sung Joo Lee,

Tópico(s)

Music and Audio Processing

Resumo

ETRI JournalVolume 32, Issue 5 p. 801-809 ArticleFree Access Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition Sung Joo Lee, Sung Joo LeeSearch for more papers by this authorByung Ok Kang, Byung Ok KangSearch for more papers by this authorHo-Young Jung, Ho-Young JungSearch for more papers by this authorYunkeun Lee, Yunkeun LeeSearch for more papers by this authorHyung Soon Kim, Hyung Soon KimSearch for more papers by this author Sung Joo Lee, Sung Joo LeeSearch for more papers by this authorByung Ok Kang, Byung Ok KangSearch for more papers by this authorHo-Young Jung, Ho-Young JungSearch for more papers by this authorYunkeun Lee, Yunkeun LeeSearch for more papers by this authorHyung Soon Kim, Hyung Soon KimSearch for more papers by this author First published: 01 October 2010 https://doi.org/10.4218/etrij.10.1510.0024Citations: 17 Sung Joo Lee (phone: +82 42 860 5732, email: [email protected]), Byung Ok Kang (email: [email protected]), Ho-Young Jung (email: [email protected]), and Yunkeun Lee (email: [email protected]) are with the Software Research Laboratory, ETRI, Daejeon, Rep. of Korea. Hyung Soon Kim (email: [email protected]) is with the Department of Electronics Engineering, Pusan National University, Busan, Rep. of Korea. AboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract This paper presents a statistical model-based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision-directed Wiener filter, we combine a decision-directed method with an original spectrum reconstruction method and develop a new two-stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource-constrained automotive devices is considered, ETSI standard advance distributed speech recognition font-end (ETSI-AFE) can be an effective solution, and ETSI-AFE is also based on the decision-directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI-AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced. References 1Y. Gong, "Speech Recognition in Noisy Environments: a Survey," Speech Commun., Vol. 16, no. 3, Apr. 1995, pp. 261– 291. 2Y. Suh and H. Kim, "Feature Compensation Combining SNR-Dependent Feature Reconstruction and Class Histogram Equalization," ETRI J., Vol. 30, no. 5, Oct. 2008, pp. 753– 755. 3J. Lim and A. Oppenheim, "Enhancement and Bandwidth Compression of Noisy Speech," Proc. IEEE, Vol. 67, no. 12, Dec. 1979, pp. 1586– 1604. 4ETSI Std. Document, " Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm," ETSI ES 202 050 V1.1.1 (2002–10). 5A. Agarwal and Y. Cheng, "Two-Stage Mel-Warped Wiener Filter for Robust Speech Recognition," Proc. IEEE-ASRU Workshop, 1999, pp. 12– 15. 6M. Cheng et al., "A Robust Front-End Algorithm for Distributed Speech Recognition," Proc. EUROSPEECH, 2001, pp. 425– 428. 7D. Macho et al., "Evaluation of a Noise-Robust DSR Front-End on Aurora Databases," Proc. ICSLP, Sept. 2002, pp. 17– 20. 8S. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans., Acoustics, Speech, Signal Process., Vol. 27, no. 2, Apr. 1979, pp. 113– 120. 9Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., Vol. 32, no. 6, Dec. 1984, pp. 1109– 1121. 10Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Log-spectral Amplitude Estimator," IEEE Trans. Acoustics, Speech, Signal Process., Vol. 33, no. 2, Apr. 1985, pp. 443– 445. 11W. Wu and P. Chen, "Subband Kalman Filtering for Speech Enhancement," IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process., Vol. 45, no. 8, Aug. 1998, pp. 1072– 1083. 12J. Gibson, B. Koo, and S. Gray, "Filtering of Colored Noise for Speech Enhancement and Coding," IEEE Trans. Signal Process., Vol. 39, no. 8, Aug. 1991, pp. 1732– 1742. 13N. Virag, "Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System," IEEE Trans. Speech Audio Process., Vol. 7, no. 2, Mar. 1999, pp. 126– 137. 14Y. Ephraim, "Statistical-Model-Based Speech Enhancement Systems," Proc. IEEE, Vol. 80, no. 10, Oct. 1992, pp. 1526– 1555. 15H. Sameti et al., "HMM-Based Strategies for Enhancement of Speech Signals Embedded in Nonstationary Noise," IEEE Trans. Speech Audio Process., Vol. 6, Sept. 1998, pp. 445– 455. 16J. Wu et al., "A Noise-Robust ASR Front-End Using Wiener Filter Constructed from MMSE Estimation of Clean Speech and Noise," Proc. IEEE-ASRU Workshop, 2003, pp. 321– 326. 17T. Arakawa, M. Tsujikawa, and R. Isotani, "Model-Based Wiener Filter for Noise Robust Speech Recognition," Proc. ICASSP, 2006, pp. 537– 540. 18N. Wiener, The Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley: NY, 1949. 19A. Kain and M. Macon, "Spectral Voice Conversion for Text-To-Speech Synthesis," Proc. ICASSP, 1998, pp. 285– 288. 20K. Park and H.S. Kim, "Narrowband to Wideband Conversion of Speech using GMM based Transformation," Proc. ICASSP, Vol. 3, June 2000, pp. 1843– 1846. 21B. Kang, H. Jung, and Y. Lee, "Discriminative Noise Adaptive Training Approach for an Environment Migration," Proc. INTERSPEECH, Aug. 2007, pp. 2085– 2089. 22H. Jung, B. Kang, and Y. Lee, "Model Adaptation using Discriminative Noise Adaptive Approach for New Environments," ETRI J., Vol. 30, no. 6, Dec. 2008, pp. 865– 867. 23S. Lee et al., "A Commercial Car Navigation System Using Korean Large Vocabulary Automatic Speech Recognizer," Proc. APSIPA ASC, Oct. 2009, pp. 286– 289. Citing Literature Volume32, Issue5October 2010Pages 801-809 ReferencesRelatedInformation

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition