Artigo Revisado por pares

Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation

2022; Institute of Electrical and Electronics Engineers; Volume: 30; Linguagem: Inglês

10.1109/taslp.2022.3165712

ISSN

2329-9304

Autores

Chenda Li, Zhuo Chen, Yanmin Qian,

Tópico(s)

Music and Audio Processing

Resumo

Continuous speech separation (CSS) aims at separating overlap-free targets from a long, partially-overlapped recording. Though it has shown promising results, the origin CSS framework does not consider cross-window information and long-span dependency. To alleviate these limitations, this work introduces two novel methods to implicitly and explicitly capture the long-span knowledge, respectively. We firstly apply the dual-path (DP) modeling architecture for the CSS framework, where the within and across window information are jointly modeled by alternating stacked local-global processing modules. Secondly, to further capture the long-span dependency, we introduce a memory-based model for CSS. An additional memory pool is designed to extract embedding from each small window, and the inter-window commutation is established above the memory embedding pool through an attention mechanism. This memory-based model can precisely control what information needs to be transferred across the windows, thus leading to both improved modeling capacity and interpretability. The experimental results on the LibriCSS dataset show that both strategies can well capture the long-span information of the continuous speech and significantly improve system performance. Moreover, further improvements are observed with the integration of these two methods.

Referência(s)